An ExtensionArray for storing sparse data. SparseArray efficiently stores data with a high frequency of a specific fill value (e.g., zeros), saving memory by only retaining non-fill elements and their indices. This class is particularly useful for large datasets where most valu
| 294 | |
| 295 | @set_module("pandas.arrays") |
| 296 | class SparseArray(OpsMixin, PandasObject, ExtensionArray): |
| 297 | """ |
| 298 | An ExtensionArray for storing sparse data. |
| 299 | |
| 300 | SparseArray efficiently stores data with a high frequency of a |
| 301 | specific fill value (e.g., zeros), saving memory by only retaining |
| 302 | non-fill elements and their indices. This class is particularly |
| 303 | useful for large datasets where most values are redundant. |
| 304 | |
| 305 | Parameters |
| 306 | ---------- |
| 307 | data : array-like or scalar |
| 308 | A dense array of values to store in the SparseArray. This may contain |
| 309 | `fill_value`. |
| 310 | sparse_index : SparseIndex, optional |
| 311 | Index indicating the locations of sparse elements. |
| 312 | fill_value : scalar, optional |
| 313 | Elements in data that are ``fill_value`` are not stored in the |
| 314 | SparseArray. For memory savings, this should be the most common value |
| 315 | in `data`. By default, `fill_value` depends on the dtype of `data`: |
| 316 | |
| 317 | =========== ========== |
| 318 | data.dtype na_value |
| 319 | =========== ========== |
| 320 | float ``np.nan`` |
| 321 | int ``0`` |
| 322 | bool False |
| 323 | datetime64 ``pd.NaT`` |
| 324 | timedelta64 ``pd.NaT`` |
| 325 | =========== ========== |
| 326 | |
| 327 | The fill value is potentially specified in three ways. In order of |
| 328 | precedence, these are |
| 329 | |
| 330 | 1. The `fill_value` argument |
| 331 | 2. ``dtype.fill_value`` if `fill_value` is None and `dtype` is |
| 332 | a ``SparseDtype`` |
| 333 | 3. ``data.dtype.fill_value`` if `fill_value` is None and `dtype` |
| 334 | is not a ``SparseDtype`` and `data` is a ``SparseArray``. |
| 335 | |
| 336 | kind : str |
| 337 | Can be 'integer' or 'block', default is 'integer'. |
| 338 | The type of storage for sparse locations. |
| 339 | |
| 340 | * 'block': Stores a `block` and `block_length` for each |
| 341 | contiguous *span* of sparse values. This is best when |
| 342 | sparse data tends to be clumped together, with large |
| 343 | regions of ``fill-value`` values between sparse values. |
| 344 | * 'integer': uses an integer to store the location of |
| 345 | each sparse value. |
| 346 | |
| 347 | dtype : np.dtype or SparseDtype, optional |
| 348 | The dtype to use for the SparseArray. For numpy dtypes, this |
| 349 | determines the dtype of ``self.sp_values``. For SparseDtype, |
| 350 | this determines ``self.sp_values`` and ``self.fill_value``. |
| 351 | copy : bool, default False |
| 352 | Whether to explicitly copy the incoming `data` array. |
| 353 |
no outgoing calls