Write a DataFrame to the parquet format. Parameters ---------- df : DataFrame path : str, path object, file-like object, or None, default None String, path object (implementing ``os.PathLike[str]``), or file-like object implementing a binary ``write()`` function
(
df: DataFrame,
path: FilePath | WriteBuffer[bytes] | None = None,
engine: str = "auto",
compression: ParquetCompressionOptions = "snappy",
index: bool | None = None,
storage_options: StorageOptions | None = None,
partition_cols: list[str] | None = None,
filesystem: Any = None,
**kwargs,
)
| 407 | |
| 408 | |
| 409 | def to_parquet( |
| 410 | df: DataFrame, |
| 411 | path: FilePath | WriteBuffer[bytes] | None = None, |
| 412 | engine: str = "auto", |
| 413 | compression: ParquetCompressionOptions = "snappy", |
| 414 | index: bool | None = None, |
| 415 | storage_options: StorageOptions | None = None, |
| 416 | partition_cols: list[str] | None = None, |
| 417 | filesystem: Any = None, |
| 418 | **kwargs, |
| 419 | ) -> bytes | None: |
| 420 | """ |
| 421 | Write a DataFrame to the parquet format. |
| 422 | |
| 423 | Parameters |
| 424 | ---------- |
| 425 | df : DataFrame |
| 426 | path : str, path object, file-like object, or None, default None |
| 427 | String, path object (implementing ``os.PathLike[str]``), or file-like |
| 428 | object implementing a binary ``write()`` function. If None, the result |
| 429 | is returned as bytes. If a string, it will be used as Root Directory |
| 430 | path when writing a partitioned dataset. The engine fastparquet does |
| 431 | not accept file-like objects. |
| 432 | engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' |
| 433 | Parquet library to use. If 'auto', then the option |
| 434 | ``io.parquet.engine`` is used. The default ``io.parquet.engine`` |
| 435 | behavior is to try 'pyarrow', falling back to 'fastparquet' if |
| 436 | 'pyarrow' is unavailable. |
| 437 | |
| 438 | When using the ``'pyarrow'`` engine and no storage options are provided |
| 439 | and a filesystem is implemented by both ``pyarrow.fs`` and ``fsspec`` |
| 440 | (e.g. "s3://"), then the ``pyarrow.fs`` filesystem is attempted first. |
| 441 | Use the filesystem keyword with an instantiated fsspec filesystem |
| 442 | if you wish to use its implementation. |
| 443 | compression : {'snappy', 'gzip', 'brotli', 'lz4', 'zstd', None}, |
| 444 | default 'snappy'. Name of the compression to use. Use ``None`` |
| 445 | for no compression. |
| 446 | index : bool, default None |
| 447 | If ``True``, include the dataframe's index(es) in the file output. If |
| 448 | ``False``, they will not be written to the file. |
| 449 | If ``None``, similar to ``True`` the dataframe's index(es) |
| 450 | will be saved. However, instead of being saved as values, |
| 451 | the RangeIndex will be stored as a range in the metadata so it |
| 452 | doesn't require much space and is faster. Other indexes will |
| 453 | be included as columns in the file output. |
| 454 | partition_cols : str or list, optional, default None |
| 455 | Column names by which to partition the dataset. |
| 456 | Columns are partitioned in the order they are given. |
| 457 | Must be None if path is not a string. |
| 458 | storage_options : dict, optional |
| 459 | Extra options that make sense for a particular storage connection, e.g. |
| 460 | host, port, username, password, etc. For HTTP(S) URLs the key-value |
| 461 | pairs are forwarded to ``urllib.request.Request`` as header options. |
| 462 | For other URLs (e.g. starting with "s3://", and "gcs://") the |
| 463 | key-value pairs are forwarded to ``fsspec.open``. Please see ``fsspec`` |
| 464 | and ``urllib`` for more details, and for more examples on storage |
| 465 | options refer `here <https://pandas.pydata.org/docs/user_guide/io.html? |
| 466 | highlight=storage_options#reading-writing-remote-files>`_. |