Function and class documentation

Core Functions

The main entry points for reading and writing DataFrames to Zarr storage.

zarrwhals.to_zarr(df, store, *, chunks='auto', shards=None, compressors='auto', mode='w-')

Write a DataFrame to Zarr storage.

Parameters:

Name Type Description Default
df IntoDataFrame

DataFrame to write (pandas, polars, anything that can be converted to a Narwhals DataFrame).

required
store PathLike[str] | StoreLike

Path or store object.

required
chunks int | Literal['auto'] | None

Chunk size in rows, or "auto" to let Zarr decide (default: "auto").

'auto'
shards int | None

Shard size in rows (default: None).

None
compressors CompressorsLike

Compressor codec(s). Can be "auto" (Zarr default), a Zarr codec object, or None for no compression (default: "auto").

'auto'
mode ZarrWriteMode

Write mode (default: "w-"):

  • "w-": Create new store, fail if exists (safe default)
  • "w": Overwrite existing store completely
'w-'

Raises:

Type Description
FileExistsError

If mode="w-" and store already exists.

TypeError

If DataFrame type not supported.

Notes

For custom compression, pass a Zarr codec object (e.g., ZstdCodec(level=5)). See the Zarr compressors documentation <https://zarr.readthedocs.io/en/stable/user-guide/arrays/#compressors>_ for available codecs and configuration options.

Examples:

Create new store (fails if exists):

>>> import pandas as pd
>>> import zarrwhals as zw
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})
>>> zw.to_zarr(df, "data.zarr")  # mode="w-" default

Overwrite existing store:

>>> zw.to_zarr(df, "data.zarr", mode="w")

With chunking and sharding:

>>> zw.to_zarr(df, "data.zarr", chunks=5000, shards=50000, mode="w")

zarrwhals.from_zarr(store, *, backend='pandas', columns=None, lazy=None)

from_zarr(
    store: PathLike[str] | StoreLike,
    *,
    backend: Literal["pandas"],
    columns: list[str] | None = None,
) -> pd.DataFrame
from_zarr(
    store: PathLike[str] | StoreLike,
    *,
    backend: Literal["polars"],
    columns: list[str] | None = None,
    lazy: Literal[True] = True,
) -> pl.LazyFrame
from_zarr(
    store: PathLike[str] | StoreLike,
    *,
    backend: Literal["polars"],
    columns: list[str] | None = None,
    lazy: Literal[False] = False,
) -> pl.DataFrame
from_zarr(
    store: PathLike[str] | StoreLike,
    *,
    backend: Literal["dask"],
    columns: list[str] | None = None,
) -> dd.DataFrame

Read DataFrame from Zarr storage.

Parameters:

Name Type Description Default
store PathLike[str] | StoreLike

Path or store object.

required
backend Literal['pandas', 'polars', 'dask']

Target backend: "pandas", "polars", or "dask" (default: "pandas").

'pandas'
columns list[str] | None

Optional columns to read (default: all).

None
lazy bool | None

For polars only: return LazyFrame (True) or DataFrame (False). Defaults to True for polars. Not applicable to pandas or dask.

None

Returns:

Type Description
DataFrame | DataFrame | LazyFrame | DataFrame

DataFrame or LazyFrame in requested backend.

Raises:

Type Description
ValueError

If store missing, columns not found, or invalid lazy/backend combination.

FileNotFoundError

If store path doesn't exist.

Examples:

>>> import zarrwhals as zw
>>> df = zw.from_zarr("data.zarr", backend="pandas")
>>> lf = zw.from_zarr("data.zarr", backend="polars")
>>> df = zw.from_zarr("data.zarr", backend="polars", lazy=False)
>>> ddf = zw.from_zarr("data.zarr", backend="dask")
>>> df = zw.from_zarr("data.zarr", backend="pandas", columns=["a", "c"])

zarrwhals.get_spec(store)

Get DataFrameGroupSpec from a Zarr store.

Parameters:

Name Type Description Default
store PathLike[str] | StoreLike

Path or store object.

required

Returns:

Type Description
DataFrameGroupSpec

Validated spec with metadata and structure.

Raises:

Type Description
ValueError

If store structure invalid.

FileNotFoundError

If store path doesn't exist.

Examples:

>>> import zarrwhals as zw
>>> spec = zw.get_spec("data.zarr")
>>> print(spec.attributes.column_order)
['a', 'b', 'c']