Core Functions¶
The main entry points for reading and writing DataFrames to Zarr storage.
zarrwhals.to_zarr(df, store, *, chunks='auto', shards=None, compressors='auto', mode='w-')
¶
Write a DataFrame to Zarr storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
IntoDataFrame
|
DataFrame to write (pandas, polars, anything that can be converted to a Narwhals DataFrame). |
required |
store
|
PathLike[str] | StoreLike
|
Path or store object. |
required |
chunks
|
int | Literal['auto'] | None
|
Chunk size in rows, or "auto" to let Zarr decide (default: "auto"). |
'auto'
|
shards
|
int | None
|
Shard size in rows (default: None). |
None
|
compressors
|
CompressorsLike
|
Compressor codec(s). Can be "auto" (Zarr default), a Zarr codec object, or None for no compression (default: "auto"). |
'auto'
|
mode
|
ZarrWriteMode
|
Write mode (default: "w-"):
|
'w-'
|
Raises:
| Type | Description |
|---|---|
FileExistsError
|
If mode="w-" and store already exists. |
TypeError
|
If DataFrame type not supported. |
Notes
For custom compression, pass a Zarr codec object (e.g., ZstdCodec(level=5)).
See the Zarr compressors documentation
<https://zarr.readthedocs.io/en/stable/user-guide/arrays/#compressors>_
for available codecs and configuration options.
Examples:
Create new store (fails if exists):
>>> import pandas as pd
>>> import zarrwhals as zw
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})
>>> zw.to_zarr(df, "data.zarr") # mode="w-" default
Overwrite existing store:
>>> zw.to_zarr(df, "data.zarr", mode="w")
With chunking and sharding:
>>> zw.to_zarr(df, "data.zarr", chunks=5000, shards=50000, mode="w")
zarrwhals.from_zarr(store, *, backend='pandas', columns=None, lazy=None)
¶
from_zarr(
store: PathLike[str] | StoreLike,
*,
backend: Literal["pandas"],
columns: list[str] | None = None,
) -> pd.DataFrame
from_zarr(
store: PathLike[str] | StoreLike,
*,
backend: Literal["polars"],
columns: list[str] | None = None,
lazy: Literal[True] = True,
) -> pl.LazyFrame
from_zarr(
store: PathLike[str] | StoreLike,
*,
backend: Literal["polars"],
columns: list[str] | None = None,
lazy: Literal[False] = False,
) -> pl.DataFrame
from_zarr(
store: PathLike[str] | StoreLike,
*,
backend: Literal["dask"],
columns: list[str] | None = None,
) -> dd.DataFrame
Read DataFrame from Zarr storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
PathLike[str] | StoreLike
|
Path or store object. |
required |
backend
|
Literal['pandas', 'polars', 'dask']
|
Target backend: "pandas", "polars", or "dask" (default: "pandas"). |
'pandas'
|
columns
|
list[str] | None
|
Optional columns to read (default: all). |
None
|
lazy
|
bool | None
|
For polars only: return LazyFrame (True) or DataFrame (False). Defaults to True for polars. Not applicable to pandas or dask. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame | DataFrame | LazyFrame | DataFrame
|
DataFrame or LazyFrame in requested backend. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If store missing, columns not found, or invalid lazy/backend combination. |
FileNotFoundError
|
If store path doesn't exist. |
Examples:
>>> import zarrwhals as zw
>>> df = zw.from_zarr("data.zarr", backend="pandas")
>>> lf = zw.from_zarr("data.zarr", backend="polars")
>>> df = zw.from_zarr("data.zarr", backend="polars", lazy=False)
>>> ddf = zw.from_zarr("data.zarr", backend="dask")
>>> df = zw.from_zarr("data.zarr", backend="pandas", columns=["a", "c"])
zarrwhals.get_spec(store)
¶
Get DataFrameGroupSpec from a Zarr store.
Parameters:
Returns:
| Type | Description |
|---|---|
DataFrameGroupSpec
|
Validated spec with metadata and structure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If store structure invalid. |
FileNotFoundError
|
If store path doesn't exist. |
Examples:
>>> import zarrwhals as zw
>>> spec = zw.get_spec("data.zarr")
>>> print(spec.attributes.column_order)
['a', 'b', 'c']