MCPcopy
hub / github.com/pandas-dev/pandas / to_parquet

Function to_parquet

pandas/io/parquet.py:409–505  ·  view source on GitHub ↗

Write a DataFrame to the parquet format. Parameters ---------- df : DataFrame path : str, path object, file-like object, or None, default None String, path object (implementing ``os.PathLike[str]``), or file-like object implementing a binary ``write()`` function

(
    df: DataFrame,
    path: FilePath | WriteBuffer[bytes] | None = None,
    engine: str = "auto",
    compression: ParquetCompressionOptions = "snappy",
    index: bool | None = None,
    storage_options: StorageOptions | None = None,
    partition_cols: list[str] | None = None,
    filesystem: Any = None,
    **kwargs,
)

Source from the content-addressed store, hash-verified

407
408
409def to_parquet(
410 df: DataFrame,
411 path: FilePath | WriteBuffer[bytes] | None = None,
412 engine: str = "auto",
413 compression: ParquetCompressionOptions = "snappy",
414 index: bool | None = None,
415 storage_options: StorageOptions | None = None,
416 partition_cols: list[str] | None = None,
417 filesystem: Any = None,
418 **kwargs,
419) -> bytes | None:
420 """
421 Write a DataFrame to the parquet format.
422
423 Parameters
424 ----------
425 df : DataFrame
426 path : str, path object, file-like object, or None, default None
427 String, path object (implementing ``os.PathLike[str]``), or file-like
428 object implementing a binary ``write()`` function. If None, the result
429 is returned as bytes. If a string, it will be used as Root Directory
430 path when writing a partitioned dataset. The engine fastparquet does
431 not accept file-like objects.
432 engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
433 Parquet library to use. If 'auto', then the option
434 ``io.parquet.engine`` is used. The default ``io.parquet.engine``
435 behavior is to try 'pyarrow', falling back to 'fastparquet' if
436 'pyarrow' is unavailable.
437
438 When using the ``'pyarrow'`` engine and no storage options are provided
439 and a filesystem is implemented by both ``pyarrow.fs`` and ``fsspec``
440 (e.g. "s3://"), then the ``pyarrow.fs`` filesystem is attempted first.
441 Use the filesystem keyword with an instantiated fsspec filesystem
442 if you wish to use its implementation.
443 compression : {'snappy', 'gzip', 'brotli', 'lz4', 'zstd', None},
444 default 'snappy'. Name of the compression to use. Use ``None``
445 for no compression.
446 index : bool, default None
447 If ``True``, include the dataframe's index(es) in the file output. If
448 ``False``, they will not be written to the file.
449 If ``None``, similar to ``True`` the dataframe's index(es)
450 will be saved. However, instead of being saved as values,
451 the RangeIndex will be stored as a range in the metadata so it
452 doesn't require much space and is faster. Other indexes will
453 be included as columns in the file output.
454 partition_cols : str or list, optional, default None
455 Column names by which to partition the dataset.
456 Columns are partitioned in the order they are given.
457 Must be None if path is not a string.
458 storage_options : dict, optional
459 Extra options that make sense for a particular storage connection, e.g.
460 host, port, username, password, etc. For HTTP(S) URLs the key-value
461 pairs are forwarded to ``urllib.request.Request`` as header options.
462 For other URLs (e.g. starting with "s3://", and "gcs://") the
463 key-value pairs are forwarded to ``fsspec.open``. Please see ``fsspec``
464 and ``urllib`` for more details, and for more examples on storage
465 options refer `here <https://pandas.pydata.org/docs/user_guide/io.html?
466 highlight=storage_options#reading-writing-remote-files>`_.

Callers 3

to_parquetMethod · 0.90
check_error_on_writeMethod · 0.90

Calls 2

get_engineFunction · 0.70
writeMethod · 0.45

Tested by 2

check_error_on_writeMethod · 0.72