Skip to content

Batch support for the next gen model#106

Open
rakeshv247 wants to merge 11 commits intospeechmatics:mainfrom
rakeshv247:feat-lang-hints
Open

Batch support for the next gen model#106
rakeshv247 wants to merge 11 commits intospeechmatics:mainfrom
rakeshv247:feat-lang-hints

Conversation

@rakeshv247
Copy link
Copy Markdown

This change adds

  • Support for the language hints feature, which is applicable only for the batch next-gen models. The language_hints info will be a new field in the transcription_config. For example:

    {
        "language": "multi",
        "language_hints": ["en", "es"],
        "language_hints_strict": True
    }
    
  • Support new fields in the metadata.language_pack_info from transcript.json-v2 results.

@rakeshv247 rakeshv247 marked this pull request as ready for review May 5, 2026 04:05
Comment thread sdk/batch/speechmatics/batch/_models.py Outdated
@giorgosHadji
Copy link
Copy Markdown
Contributor

Have you tested this with an actual worker?

Comment thread sdk/batch/speechmatics/batch/_models.py Outdated
Comment thread sdk/batch/speechmatics/batch/_models.py Outdated
Comment thread sdk/batch/speechmatics/batch/_models.py Outdated
Comment thread sdk/batch/speechmatics/batch/_models.py
Comment thread tests/batch/test_models.py
Comment thread tests/batch/test_models.py
Comment thread tests/batch/test_models.py
Comment thread tests/batch/test_models.py
Comment thread tests/batch/test_models.py
@giorgosHadji
Copy link
Copy Markdown
Contributor

LGTM in general

@rakeshv247
Copy link
Copy Markdown
Author

rakeshv247 commented May 6, 2026

Have you tested this with an actual worker?

Tested the changes with omni and non-omni workers in a SaaS dev environment.

max_delay_mode: Optional[str] = None
transcript_filtering_config: Optional[TranscriptFilteringConfig] = None
audio_filtering_config: Optional[AudioFilteringConfig] = None
language_hints: Optional[list[str]] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably ignorant to whether this has been discussed before, but would it be worth putting language_hints and language_hints_strict into a single config class like our other configs?

That way users could call

lang_config = LangConfig(
    hints = ["en", "jp"],
    strict = True

config = TranscriptionConfig(
    model = OperatingPoint.OMNI,
    lang_config = lang_config
)

Thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a valid point! Forgot these! Nice shout

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do that because language_hints and language_hints_strict are flat scalar fields in the transcription_config, just like their peers max_delay + max_delay_mode in the transcription_config. The other fields that are defined as dataclasses are real JSON objects rather than scalar types, for example: TranscriptFilteringConfig and AudioFilteringConfig. Does this make sense?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, and understand that point. Just wondered if it might be good from a user perspective to group everything that way. Happy with the explanation though :)


language: str = "en"
operating_point: OperatingPoint = OperatingPoint.ENHANCED
model: Optional[OperatingPoint] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels strange that users will call the OperatingPoint enum for a field called model.

Copy link
Copy Markdown
Author

@rakeshv247 rakeshv247 May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can set either model or operating_point — both are accepted. When model is set, it takes precedence and is sent as operating_point in the request; only operating_point ever goes over the wire. The model field is purely an alias added for ergonomic familiarity and it mirrors how LLM APIs (OpenAI, Anthropic, etc.) expose model selection, making the omni-v1 use case feel natural to users coming from that ecosystem.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, I understand that. This is just a nit from me. It feels odd calling

model = OperatingPoint.OMNI

If someone were to use the model field.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately 😞, we have to close this PR for now (if you have followed the thread).

Copy link
Copy Markdown
Contributor

@J-Jaywalker J-Jaywalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, left a couple of minor comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants