Batch support for the next gen model#106
Batch support for the next gen model#106rakeshv247 wants to merge 11 commits intospeechmatics:mainfrom
Conversation
|
Have you tested this with an actual worker? |
|
LGTM in general |
Tested the changes with |
| max_delay_mode: Optional[str] = None | ||
| transcript_filtering_config: Optional[TranscriptFilteringConfig] = None | ||
| audio_filtering_config: Optional[AudioFilteringConfig] = None | ||
| language_hints: Optional[list[str]] = None |
There was a problem hiding this comment.
I'm probably ignorant to whether this has been discussed before, but would it be worth putting language_hints and language_hints_strict into a single config class like our other configs?
That way users could call
lang_config = LangConfig(
hints = ["en", "jp"],
strict = True
config = TranscriptionConfig(
model = OperatingPoint.OMNI,
lang_config = lang_config
)Thoughts?
There was a problem hiding this comment.
Ah, that's a valid point! Forgot these! Nice shout
There was a problem hiding this comment.
I didn't do that because language_hints and language_hints_strict are flat scalar fields in the transcription_config, just like their peers max_delay + max_delay_mode in the transcription_config. The other fields that are defined as dataclasses are real JSON objects rather than scalar types, for example: TranscriptFilteringConfig and AudioFilteringConfig. Does this make sense?
There was a problem hiding this comment.
Makes sense, and understand that point. Just wondered if it might be good from a user perspective to group everything that way. Happy with the explanation though :)
|
|
||
| language: str = "en" | ||
| operating_point: OperatingPoint = OperatingPoint.ENHANCED | ||
| model: Optional[OperatingPoint] = None |
There was a problem hiding this comment.
Feels strange that users will call the OperatingPoint enum for a field called model.
There was a problem hiding this comment.
Users can set either model or operating_point — both are accepted. When model is set, it takes precedence and is sent as operating_point in the request; only operating_point ever goes over the wire. The model field is purely an alias added for ergonomic familiarity and it mirrors how LLM APIs (OpenAI, Anthropic, etc.) expose model selection, making the omni-v1 use case feel natural to users coming from that ecosystem.
There was a problem hiding this comment.
Oh no, I understand that. This is just a nit from me. It feels odd calling
model = OperatingPoint.OMNI
If someone were to use the model field.
There was a problem hiding this comment.
Unfortunately 😞, we have to close this PR for now (if you have followed the thread).
J-Jaywalker
left a comment
There was a problem hiding this comment.
Reviewed, left a couple of minor comments.
This change adds
Support for the language hints feature, which is applicable only for the batch next-gen models. The
language_hintsinfo will be a new field in the transcription_config. For example:Support new fields in the
metadata.language_pack_infofromtranscript.json-v2results.