Batch support for the next gen model by rakeshv247 · Pull Request #106 · speechmatics/speechmatics-python-sdk

rakeshv247 · 2026-05-04T06:46:23Z

This change adds

Support for the language hints feature, which is applicable only for the batch next-gen models. The language_hints info will be a new field in the transcription_config. For example:
```
{
    "language": "multi",
    "language_hints": ["en", "es"],
    "language_hints_strict": True
}
```
Support new fields in the metadata.language_pack_info from transcript.json-v2 results.

… results

giorgosHadji · 2026-05-05T14:46:18Z

Have you tested this with an actual worker?

giorgosHadji · 2026-05-05T14:46:46Z

LGTM in general

rakeshv247 · 2026-05-06T08:33:07Z

Have you tested this with an actual worker?

Tested the changes with omni and non-omni workers in a SaaS dev environment.

J-Jaywalker · 2026-05-06T08:56:36Z

    max_delay_mode: Optional[str] = None
    transcript_filtering_config: Optional[TranscriptFilteringConfig] = None
    audio_filtering_config: Optional[AudioFilteringConfig] = None
+    language_hints: Optional[list[str]] = None


I'm probably ignorant to whether this has been discussed before, but would it be worth putting language_hints and language_hints_strict into a single config class like our other configs?

That way users could call

lang_config = LangConfig( hints = ["en", "jp"], strict = True config = TranscriptionConfig( model = OperatingPoint.OMNI, lang_config = lang_config )

Thoughts?

Ah, that's a valid point! Forgot these! Nice shout

I didn't do that because language_hints and language_hints_strict are flat scalar fields in the transcription_config, just like their peers max_delay + max_delay_mode in the transcription_config. The other fields that are defined as dataclasses are real JSON objects rather than scalar types, for example: TranscriptFilteringConfig and AudioFilteringConfig. Does this make sense?

Makes sense, and understand that point. Just wondered if it might be good from a user perspective to group everything that way. Happy with the explanation though :)

J-Jaywalker · 2026-05-06T08:58:18Z


    language: str = "en"
    operating_point: OperatingPoint = OperatingPoint.ENHANCED
+    model: Optional[OperatingPoint] = None


Feels strange that users will call the OperatingPoint enum for a field called model.

Users can set either model or operating_point — both are accepted. When model is set, it takes precedence and is sent as operating_point in the request; only operating_point ever goes over the wire. The model field is purely an alias added for ergonomic familiarity and it mirrors how LLM APIs (OpenAI, Anthropic, etc.) expose model selection, making the omni-v1 use case feel natural to users coming from that ecosystem.

Oh no, I understand that. This is just a nit from me. It feels odd calling

model = OperatingPoint.OMNI

If someone were to use the model field.

Unfortunately 😞, we have to close this PR for now (if you have followed the thread).

J-Jaywalker

Reviewed, left a couple of minor comments.

rakeshv added 2 commits May 4, 2026 11:12

Batch language hints support for the next gen model

90ca133

language_pack_info update to support language hints in the transcript…

e5f5a74

… results

rakeshv247 marked this pull request as ready for review May 5, 2026 04:05

JamesG-Speechmatics reviewed May 5, 2026

View reviewed changes

Comment thread sdk/batch/speechmatics/batch/_models.py Outdated

giorgosHadji reviewed May 5, 2026

View reviewed changes

rakeshv added 9 commits May 6, 2026 09:32

rename per_lang_word_delimiter to per_lang_word_delimiters

f5de88e

update docs string for language hints

c6c06e2

add constants

5dfe652

update assertions

7bd422b

update tests

dbd87f0

comment on current_group structure

be2deba

comment on current_group structure

0e1e57d

add model field which is an alias of operating point

2216a20

model and op cannot coexist

5fa2d39

J-Jaywalker reviewed May 6, 2026

View reviewed changes

Conversation

rakeshv247 commented May 4, 2026

Uh oh!

Uh oh!

giorgosHadji commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

giorgosHadji commented May 5, 2026

Uh oh!

rakeshv247 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

J-Jaywalker May 6, 2026

Choose a reason for hiding this comment

Uh oh!

giorgosHadji May 6, 2026

Choose a reason for hiding this comment

Uh oh!

rakeshv247 May 6, 2026

Choose a reason for hiding this comment

Uh oh!

J-Jaywalker May 6, 2026

Choose a reason for hiding this comment

Uh oh!

J-Jaywalker May 6, 2026

Choose a reason for hiding this comment

Uh oh!

rakeshv247 May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

J-Jaywalker May 6, 2026

Choose a reason for hiding this comment

Uh oh!

rakeshv247 May 6, 2026

Choose a reason for hiding this comment

Uh oh!

J-Jaywalker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

rakeshv247 commented May 6, 2026 •

edited

Loading

rakeshv247 May 6, 2026 •

edited

Loading