Return boolean Series denoting duplicate rows. Considering certain columns is optional. Parameters ---------- subset : column label or iterable of labels, optional Only consider certain columns for identifying duplicates, by default
(
self,
subset: Hashable | Iterable[Hashable] | None = None,
keep: DropKeep = "first",
)
| 7963 | return result |
| 7964 | |
| 7965 | def duplicated( |
| 7966 | self, |
| 7967 | subset: Hashable | Iterable[Hashable] | None = None, |
| 7968 | keep: DropKeep = "first", |
| 7969 | ) -> Series: |
| 7970 | """ |
| 7971 | Return boolean Series denoting duplicate rows. |
| 7972 | |
| 7973 | Considering certain columns is optional. |
| 7974 | |
| 7975 | Parameters |
| 7976 | ---------- |
| 7977 | subset : column label or iterable of labels, optional |
| 7978 | Only consider certain columns for identifying duplicates, by |
| 7979 | default use all of the columns. |
| 7980 | keep : {'first', 'last', False}, default 'first' |
| 7981 | Determines which duplicates (if any) to mark. |
| 7982 | |
| 7983 | - ``first`` : Mark duplicates as ``True`` except for the first occurrence. |
| 7984 | - ``last`` : Mark duplicates as ``True`` except for the last occurrence. |
| 7985 | - False : Mark all duplicates as ``True``. |
| 7986 | |
| 7987 | Returns |
| 7988 | ------- |
| 7989 | Series |
| 7990 | Boolean series for each duplicated rows. |
| 7991 | |
| 7992 | See Also |
| 7993 | -------- |
| 7994 | Index.duplicated : Equivalent method on index. |
| 7995 | Series.duplicated : Equivalent method on Series. |
| 7996 | Series.drop_duplicates : Remove duplicate values from Series. |
| 7997 | DataFrame.drop_duplicates : Remove duplicate values from DataFrame. |
| 7998 | |
| 7999 | Examples |
| 8000 | -------- |
| 8001 | Consider dataset containing ramen rating. |
| 8002 | |
| 8003 | >>> df = pd.DataFrame( |
| 8004 | ... { |
| 8005 | ... "brand": ["Yum Yum", "Yum Yum", "Indomie", "Indomie", "Indomie"], |
| 8006 | ... "style": ["cup", "cup", "cup", "pack", "pack"], |
| 8007 | ... "rating": [4, 4, 3.5, 15, 5], |
| 8008 | ... } |
| 8009 | ... ) |
| 8010 | >>> df |
| 8011 | brand style rating |
| 8012 | 0 Yum Yum cup 4.0 |
| 8013 | 1 Yum Yum cup 4.0 |
| 8014 | 2 Indomie cup 3.5 |
| 8015 | 3 Indomie pack 15.0 |
| 8016 | 4 Indomie pack 5.0 |
| 8017 | |
| 8018 | By default, for each set of duplicated values, the first occurrence |
| 8019 | is set on False and all others on True. |
| 8020 | |
| 8021 | >>> df.duplicated() |
| 8022 | 0 False |