explicitly encode chunk grid shape in _ShardIndex

(claude wrote this at my request. I agree with it)

# Make `_ShardIndex` store `chunks_per_shard` explicitly instead of inferring it from array shape

## Background

`_ShardIndex` (in `src/zarr/codecs/sharding.py`) currently has a single field:

```python
class _ShardIndex(NamedTuple):
    offsets_and_lengths: npt.NDArray[np.uint64]
```

The shard's chunk grid shape is not stored — it's reverse-engineered from the array's shape everywhere it's needed:

```python
@property
def chunks_per_shard(self) -> tuple[int, ...]:
    result = tuple(self.offsets_and_lengths.shape[0:-1])
    # The cast is required until https://github.com/numpy/numpy/pull/27211 is merged
    return cast("tuple[int, ...]", result)
```

This works because `offsets_and_lengths` is constructed with shape `(*chunks_per_shard, 2)` — so `.shape[:-1]` recovers the grid. But the recovery is lossy at the boundary: for a **0-dimensional array**, `chunks_per_shard == ()`, so `offsets_and_lengths` has shape `(2,)` instead of `(1, 2)`. The array is effectively `(n_chunk_dims + 1)`-dimensional, which collapses to rank-1 for 0-D and breaks any method that assumes rank ≥ 2.

This is the root cause behind #3751 / #3966: `get_chunk_slices_vectorized` did `offsets_and_lengths[:, 0]`, which fails on the rank-1 array. #3966 patches that one method with a special-case branch, but the underlying representation is still irregular — the scalar methods rely on the `(2,)` shape, the vectorized methods need `(1, 2)`, and `chunks_per_shard` only works for 0-D by the accident of `()[:-1] == ()`.

## Proposal

Store the chunk grid shape as an explicit field; make `offsets_and_lengths` a dumb payload whose shape is no longer load-bearing:

```python
class _ShardIndex(NamedTuple):
    chunks_per_shard: tuple[int, ...]
    offsets_and_lengths: npt.NDArray[np.uint64]
```

With `chunks_per_shard` authoritative:

- The `chunks_per_shard` property becomes a trivial field read — and the `numpy/numpy#27211` cast workaround on line 132 goes away (it's a real tuple, not a numpy shape).
- The 0-D special case in `get_chunk_slices_vectorized` (added in #3966) can be removed — there's no longer a rank to infer from the array.
- The scattered `offsets_and_lengths.shape[:-1]` reads (e.g. the chunk iterators around lines 283, 301) become field reads.
- `offsets_and_lengths` could optionally be normalized to always-2-D `(prod(chunks_per_shard), 2)`, since the real shape now lives elsewhere — making *every* lookup method uniform. (Optional; can be a second step.)

The "is this 0-D?" question moves from *inspecting an array's rank* (or, in #3966's patch, inspecting the **caller's** query-array shape) to *reading the index's own stored schema* — which is where that knowledge belongs.

## Cost / scope

This is an API change to `_ShardIndex`'s constructor. Call sites that need updating (~4):

- `_ShardIndex.create_empty` — already *takes* `chunks_per_shard` as an argument; just store it.
- `_ShardReader.create_empty` — threads the value through.
- `_ShardIndex(index_array.as_numpy_array())` on the deserialization path (~line 715) — needs `chunks_per_shard` threaded in. It's available at that call site (the shard reader knows it), but it's a signature change.
- The `_ShardIndex(...)` construction around line 797.

## Acceptance

- `_ShardIndex` stores `chunks_per_shard` explicitly; the `chunks_per_shard` property returns it directly.
- The `numpy/numpy#27211` cast workaround is removed.
- The 0-D special-case branch added in #3966 is removed; `get_chunk_slices_vectorized` has a single uniform path.
- Existing sharding tests pass, including the 0-D regression tests from #3966 (`test_sharding_zero_dimensional`, `test_shard_index_get_chunk_slices_vectorized_zero_dimensional`).

## Related

- #3751 — original bug report (can't write a 0-D sharded array)
- #3966 — the targeted bugfix that this issue would let us simplify


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

explicitly encode chunk grid shape in _ShardIndex #3974

Make `_ShardIndex` store `chunks_per_shard` explicitly instead of inferring it from array shape

Background

Proposal

Cost / scope

Acceptance

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

explicitly encode chunk grid shape in _ShardIndex #3974

Description

Make _ShardIndex store chunks_per_shard explicitly instead of inferring it from array shape

Background

Proposal

Cost / scope

Acceptance

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `_ShardIndex` store `chunks_per_shard` explicitly instead of inferring it from array shape