hub / github.com/pandas-dev/pandas / duplicated

Method duplicated

pandas/core/frame.py:7965–8097 · view source on GitHub ↗

Return boolean Series denoting duplicate rows. Considering certain columns is optional. Parameters ---------- subset : column label or iterable of labels, optional Only consider certain columns for identifying duplicates, by default

(
        self,
        subset: Hashable | Iterable[Hashable] | None = None,
        keep: DropKeep = "first",
    )

Source from the content-addressed store, hash-verified

7963	return result
7964
7965	def duplicated(
7966	self,
7967	subset: Hashable \| Iterable[Hashable] \| None = None,
7968	keep: DropKeep = "first",
7969	) -> Series:
7970	"""
7971	Return boolean Series denoting duplicate rows.
7972
7973	Considering certain columns is optional.
7974
7975	Parameters
7976	----------
7977	subset : column label or iterable of labels, optional
7978	Only consider certain columns for identifying duplicates, by
7979	default use all of the columns.
7980	keep : {'first', 'last', False}, default 'first'
7981	Determines which duplicates (if any) to mark.
7982
7983	- ``first`` : Mark duplicates as ``True`` except for the first occurrence.
7984	- ``last`` : Mark duplicates as ``True`` except for the last occurrence.
7985	- False : Mark all duplicates as ``True``.
7986
7987	Returns
7988	-------
7989	Series
7990	Boolean series for each duplicated rows.
7991
7992	See Also
7993	--------
7994	Index.duplicated : Equivalent method on index.
7995	Series.duplicated : Equivalent method on Series.
7996	Series.drop_duplicates : Remove duplicate values from Series.
7997	DataFrame.drop_duplicates : Remove duplicate values from DataFrame.
7998
7999	Examples
8000	--------
8001	Consider dataset containing ramen rating.
8002
8003	>>> df = pd.DataFrame(
8004	... {
8005	... "brand": ["Yum Yum", "Yum Yum", "Indomie", "Indomie", "Indomie"],
8006	... "style": ["cup", "cup", "cup", "pack", "pack"],
8007	... "rating": [4, 4, 3.5, 15, 5],
8008	... }
8009	... )
8010	>>> df
8011	brand style rating
8012	0 Yum Yum cup 4.0
8013	1 Yum Yum cup 4.0
8014	2 Indomie cup 3.5
8015	3 Indomie pack 15.0
8016	4 Indomie pack 5.0
8017
8018	By default, for each set of duplicated values, the first occurrence
8019	is set on False and all others on True.
8020
8021	>>> df.duplicated()
8022	0 False

Callers 11

drop_duplicatesMethod · 0.95

test_drop_duplicatesFunction · 0.95

test_duplicated_with_misspelled_column_nameFunction · 0.95

test_duplicated_implemented_no_recursionFunction · 0.95

test_duplicated_keepFunction · 0.95

test_duplicated_nan_noneFunction · 0.95

test_duplicated_subsetFunction · 0.95

test_duplicated_on_empty_frameFunction · 0.95

test_frame_datetime64_duplicatedFunction · 0.95

set_indexMethod · 0.45

explodeMethod · 0.45

Calls 6

itemsMethod · 0.95

get_group_indexFunction · 0.90

IndexClass · 0.85

duplicatedFunction · 0.85

__finalize__Method · 0.80

_constructor_slicedMethod · 0.45

Tested by 8

test_drop_duplicatesFunction · 0.76

test_duplicated_with_misspelled_column_nameFunction · 0.76

test_duplicated_implemented_no_recursionFunction · 0.76

test_duplicated_keepFunction · 0.76

test_duplicated_nan_noneFunction · 0.76

test_duplicated_subsetFunction · 0.76

test_duplicated_on_empty_frameFunction · 0.76

test_frame_datetime64_duplicatedFunction · 0.76