MCPcopy
hub / github.com/pandas-dev/pandas / extractall

Method extractall

pandas/core/strings/accessor.py:3460–3535  ·  view source on GitHub ↗

r""" Extract capture groups in the regex `pat` as columns in DataFrame. For each subject string in the Series, extract groups from all matches of regular expression pat. When each subject string in the Series has exactly one match, extractall(pat).xs(0, level='match'

(self, pat, flags: int = 0)

Source from the content-addressed store, hash-verified

3458
3459 @forbid_nonstring_types(["bytes"])
3460 def extractall(self, pat, flags: int = 0) -> DataFrame:
3461 r"""
3462 Extract capture groups in the regex `pat` as columns in DataFrame.
3463
3464 For each subject string in the Series, extract groups from all
3465 matches of regular expression pat. When each subject string in the
3466 Series has exactly one match, extractall(pat).xs(0, level='match')
3467 is the same as extract(pat).
3468
3469 Parameters
3470 ----------
3471 pat : str
3472 Regular expression pattern with capturing groups.
3473 flags : int, default 0 (no flags)
3474 A ``re`` module flag, for example ``re.IGNORECASE``. These allow
3475 to modify regular expression matching for things like case, spaces,
3476 etc. Multiple flags can be combined with the bitwise OR operator,
3477 for example ``re.IGNORECASE | re.MULTILINE``.
3478
3479 Returns
3480 -------
3481 DataFrame
3482 A ``DataFrame`` with one row for each match, and one column for each
3483 group. Its rows have a ``MultiIndex`` with first levels that come from
3484 the subject ``Series``. The last level is named 'match' and indexes the
3485 matches in each item of the ``Series``. Any capture group names in
3486 regular expression pat will be used for column names; otherwise capture
3487 group numbers will be used.
3488
3489 See Also
3490 --------
3491 extract : Returns first match only (not all matches).
3492
3493 Examples
3494 --------
3495 A pattern with one group will return a DataFrame with one column.
3496 Indices with no matches will not appear in the result.
3497
3498 >>> s = pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"])
3499 >>> s.str.extractall(r"[ab](\d)")
3500 0
3501 match
3502 A 0 1
3503 1 2
3504 B 0 1
3505
3506 Capture group names are used for column names of the result.
3507
3508 >>> s.str.extractall(r"[ab](?P<digit>\d)")
3509 digit
3510 match
3511 A 0 1
3512 1 2
3513 B 0 1
3514
3515 A pattern with two groups will return a DataFrame with two columns.
3516
3517 >>> s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)")

Calls 1

str_extractallFunction · 0.85