Skip to content

Why is the context axis required for expert sharding when shard_exp_on_fsdp is enabled? #4249

Description

@pathfinder-pf

Feature or Model Request

No response

Additional Context

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions