hub / github.com/pandas-dev/pandas / to_parquet

Function to_parquet

pandas/io/parquet.py:409–505 · view source on GitHub ↗

Write a DataFrame to the parquet format. Parameters ---------- df : DataFrame path : str, path object, file-like object, or None, default None String, path object (implementing ``os.PathLike[str]``), or file-like object implementing a binary ``write()`` function

(
    df: DataFrame,
    path: FilePath | WriteBuffer[bytes] | None = None,
    engine: str = "auto",
    compression: ParquetCompressionOptions = "snappy",
    index: bool | None = None,
    storage_options: StorageOptions | None = None,
    partition_cols: list[str] | None = None,
    filesystem: Any = None,
    **kwargs,
)

Source from the content-addressed store, hash-verified

407
408
409	def to_parquet(
410	df: DataFrame,
411	path: FilePath \| WriteBuffer[bytes] \| None = None,
412	engine: str = "auto",
413	compression: ParquetCompressionOptions = "snappy",
414	index: bool \| None = None,
415	storage_options: StorageOptions \| None = None,
416	partition_cols: list[str] \| None = None,
417	filesystem: Any = None,
418	**kwargs,
419	) -> bytes \| None:
420	"""
421	Write a DataFrame to the parquet format.
422
423	Parameters
424	----------
425	df : DataFrame
426	path : str, path object, file-like object, or None, default None
427	String, path object (implementing ``os.PathLike[str]``), or file-like
428	object implementing a binary ``write()`` function. If None, the result
429	is returned as bytes. If a string, it will be used as Root Directory
430	path when writing a partitioned dataset. The engine fastparquet does
431	not accept file-like objects.
432	engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
433	Parquet library to use. If 'auto', then the option
434	``io.parquet.engine`` is used. The default ``io.parquet.engine``
435	behavior is to try 'pyarrow', falling back to 'fastparquet' if
436	'pyarrow' is unavailable.
437
438	When using the ``'pyarrow'`` engine and no storage options are provided
439	and a filesystem is implemented by both ``pyarrow.fs`` and ``fsspec``
440	(e.g. "s3://"), then the ``pyarrow.fs`` filesystem is attempted first.
441	Use the filesystem keyword with an instantiated fsspec filesystem
442	if you wish to use its implementation.
443	compression : {'snappy', 'gzip', 'brotli', 'lz4', 'zstd', None},
444	default 'snappy'. Name of the compression to use. Use ``None``
445	for no compression.
446	index : bool, default None
447	If ``True``, include the dataframe's index(es) in the file output. If
448	``False``, they will not be written to the file.
449	If ``None``, similar to ``True`` the dataframe's index(es)
450	will be saved. However, instead of being saved as values,
451	the RangeIndex will be stored as a range in the metadata so it
452	doesn't require much space and is faster. Other indexes will
453	be included as columns in the file output.
454	partition_cols : str or list, optional, default None
455	Column names by which to partition the dataset.
456	Columns are partitioned in the order they are given.
457	Must be None if path is not a string.
458	storage_options : dict, optional
459	Extra options that make sense for a particular storage connection, e.g.
460	host, port, username, password, etc. For HTTP(S) URLs the key-value
461	pairs are forwarded to ``urllib.request.Request`` as header options.
462	For other URLs (e.g. starting with "s3://", and "gcs://") the
463	key-value pairs are forwarded to ``fsspec.open``. Please see ``fsspec``
464	and ``urllib`` for more details, and for more examples on storage
465	options refer `here <https://pandas.pydata.org/docs/user_guide/io.html?
466	highlight=storage_options#reading-writing-remote-files>`_.

Callers 3

to_parquetMethod · 0.90

check_error_on_writeMethod · 0.90

check_external_error_on_writeMethod · 0.90

Calls 2

get_engineFunction · 0.70

writeMethod · 0.45

Tested by 2

check_error_on_writeMethod · 0.72

check_external_error_on_writeMethod · 0.72