MCPcopy
hub / github.com/pandas-dev/pandas / duplicated

Method duplicated

pandas/core/frame.py:7965–8097  ·  view source on GitHub ↗

Return boolean Series denoting duplicate rows. Considering certain columns is optional. Parameters ---------- subset : column label or iterable of labels, optional Only consider certain columns for identifying duplicates, by default

(
        self,
        subset: Hashable | Iterable[Hashable] | None = None,
        keep: DropKeep = "first",
    )

Source from the content-addressed store, hash-verified

7963 return result
7964
7965 def duplicated(
7966 self,
7967 subset: Hashable | Iterable[Hashable] | None = None,
7968 keep: DropKeep = "first",
7969 ) -> Series:
7970 """
7971 Return boolean Series denoting duplicate rows.
7972
7973 Considering certain columns is optional.
7974
7975 Parameters
7976 ----------
7977 subset : column label or iterable of labels, optional
7978 Only consider certain columns for identifying duplicates, by
7979 default use all of the columns.
7980 keep : {'first', 'last', False}, default 'first'
7981 Determines which duplicates (if any) to mark.
7982
7983 - ``first`` : Mark duplicates as ``True`` except for the first occurrence.
7984 - ``last`` : Mark duplicates as ``True`` except for the last occurrence.
7985 - False : Mark all duplicates as ``True``.
7986
7987 Returns
7988 -------
7989 Series
7990 Boolean series for each duplicated rows.
7991
7992 See Also
7993 --------
7994 Index.duplicated : Equivalent method on index.
7995 Series.duplicated : Equivalent method on Series.
7996 Series.drop_duplicates : Remove duplicate values from Series.
7997 DataFrame.drop_duplicates : Remove duplicate values from DataFrame.
7998
7999 Examples
8000 --------
8001 Consider dataset containing ramen rating.
8002
8003 >>> df = pd.DataFrame(
8004 ... {
8005 ... "brand": ["Yum Yum", "Yum Yum", "Indomie", "Indomie", "Indomie"],
8006 ... "style": ["cup", "cup", "cup", "pack", "pack"],
8007 ... "rating": [4, 4, 3.5, 15, 5],
8008 ... }
8009 ... )
8010 >>> df
8011 brand style rating
8012 0 Yum Yum cup 4.0
8013 1 Yum Yum cup 4.0
8014 2 Indomie cup 3.5
8015 3 Indomie pack 15.0
8016 4 Indomie pack 5.0
8017
8018 By default, for each set of duplicated values, the first occurrence
8019 is set on False and all others on True.
8020
8021 >>> df.duplicated()
8022 0 False

Callers 11

drop_duplicatesMethod · 0.95
test_drop_duplicatesFunction · 0.95
test_duplicated_keepFunction · 0.95
test_duplicated_nan_noneFunction · 0.95
test_duplicated_subsetFunction · 0.95
set_indexMethod · 0.45
explodeMethod · 0.45

Calls 6

itemsMethod · 0.95
get_group_indexFunction · 0.90
IndexClass · 0.85
duplicatedFunction · 0.85
__finalize__Method · 0.80
_constructor_slicedMethod · 0.45