hub / github.com/pandas-dev/pandas / SparseArray

Class SparseArray

pandas/core/arrays/sparse/array.py:296–1887 · view source on GitHub ↗

An ExtensionArray for storing sparse data. SparseArray efficiently stores data with a high frequency of a specific fill value (e.g., zeros), saving memory by only retaining non-fill elements and their indices. This class is particularly useful for large datasets where most valu

Source from the content-addressed store, hash-verified

294
295	@set_module("pandas.arrays")
296	class SparseArray(OpsMixin, PandasObject, ExtensionArray):
297	"""
298	An ExtensionArray for storing sparse data.
299
300	SparseArray efficiently stores data with a high frequency of a
301	specific fill value (e.g., zeros), saving memory by only retaining
302	non-fill elements and their indices. This class is particularly
303	useful for large datasets where most values are redundant.
304
305	Parameters
306	----------
307	data : array-like or scalar
308	A dense array of values to store in the SparseArray. This may contain
309	`fill_value`.
310	sparse_index : SparseIndex, optional
311	Index indicating the locations of sparse elements.
312	fill_value : scalar, optional
313	Elements in data that are ``fill_value`` are not stored in the
314	SparseArray. For memory savings, this should be the most common value
315	in `data`. By default, `fill_value` depends on the dtype of `data`:
316
317	=========== ==========
318	data.dtype na_value
319	=========== ==========
320	float ``np.nan``
321	int ``0``
322	bool False
323	datetime64 ``pd.NaT``
324	timedelta64 ``pd.NaT``
325	=========== ==========
326
327	The fill value is potentially specified in three ways. In order of
328	precedence, these are
329
330	1. The `fill_value` argument
331	2. ``dtype.fill_value`` if `fill_value` is None and `dtype` is
332	a ``SparseDtype``
333	3. ``data.dtype.fill_value`` if `fill_value` is None and `dtype`
334	is not a ``SparseDtype`` and `data` is a ``SparseArray``.
335
336	kind : str
337	Can be 'integer' or 'block', default is 'integer'.
338	The type of storage for sparse locations.
339
340	* 'block': Stores a `block` and `block_length` for each
341	contiguous span of sparse values. This is best when
342	sparse data tends to be clumped together, with large
343	regions of ``fill-value`` values between sparse values.
344	* 'integer': uses an integer to store the location of
345	each sparse value.
346
347	dtype : np.dtype or SparseDtype, optional
348	The dtype to use for the SparseArray. For numpy dtypes, this
349	determines the dtype of ``self.sp_values``. For SparseDtype,
350	this determines ``self.sp_values`` and ``self.fill_value``.
351	copy : bool, default False
352	Whether to explicitly copy the incoming `data` array.
353

Callers 15

setupMethod · 0.90

time_sparse_arrayMethod · 0.90

setupMethod · 0.90

make_block_arrayMethod · 0.90

setupMethod · 0.90

_get_dummies_1dFunction · 0.90

create_blockFunction · 0.90

test_astypeMethod · 0.90

test_astype_boolMethod · 0.90

Calls

no outgoing calls

Tested by 15

create_blockFunction · 0.72

test_astypeMethod · 0.72

test_astype_boolMethod · 0.72

test_astype_allMethod · 0.72

test_astype_nan_raisesMethod · 0.72

test_astype_copy_falseMethod · 0.72

test_astype_dt64_to_int64Method · 0.72

test_get_attributesMethod · 0.72

test_to_cooMethod · 0.72

test_to_denseMethod · 0.72

test_densityMethod · 0.72

test_series_from_cooMethod · 0.72