[GH#540] [Future] Sherpa-ONNX Benchmarks: STT/TTS Latency, RAM, CPU auf Windows Gaming-Rig #17

Open
opened 2026-05-19 22:15:33 +02:00 by Max · 0 comments
Owner

Migrated from GitHub #540
Originally created by @Bio1988 on 2026-05-15T07:43:44Z


Context

Sherpa ONNX is the definitive local speech pipeline (ADR 0012). C.4 (STT) and C.5 (TTS) providers are integrated. This issue tracks comprehensive benchmarks now deferred from Batch 5C (#379).

Scope

Benchmark all Sherpa ONNX models planned for Strategy Desktop on a Windows iRacing rig:

STT Models (streaming, per-language)

Model Size Priority
zipformer-en-20M ~125 MB Primary (EN)
zipformer-en-20M-mobile ~105 MB Low-end fallback
zipformer-en-303M ~303 MB High-accuracy

TTS Models

Model Size Priority
Kokoro-int8 (11 speakers) ~143 MB Primary
VITS LJSpeech ~106 MB Fast fallback
PocketTTS-int8 (voice cloning) ~96 MB Future voice packs

Metrics

  • Latency: Generation time (ms) for 1s/3s/5s utterances
  • RAM: Resident set size during active STT/TTS
  • CPU: % utilization, E-Core vs P-Core impact
  • GPU: ONNX provider overhead (CUDA/DirectML vs CPU)
  • Gaming impact: FPS delta while iRacing runs

Acceptance Criteria

  • Benchmark report with tables per model/metric
  • Recommendation for default model selection
  • FPS impact during active STT (PTT) and TTS (callout)
  • E-Core vs P-Core affinity recommendation

Non-goals

  • Voice cloning quality benchmarks
  • Non-English model benchmarks (deferred to localization)
  • VAD/Keyword Spotting benchmarks (small models, negligible)
Migrated from [GitHub #540](https://github.com/Bio1988/strategy-desktop/issues/540) Originally created by @Bio1988 on 2026-05-15T07:43:44Z --- ## Context Sherpa ONNX is the definitive local speech pipeline (ADR 0012). C.4 (STT) and C.5 (TTS) providers are integrated. This issue tracks comprehensive benchmarks now deferred from Batch 5C (#379). ## Scope Benchmark all Sherpa ONNX models planned for Strategy Desktop on a Windows iRacing rig: ### STT Models (streaming, per-language) | Model | Size | Priority | |---|---|---| | zipformer-en-20M | ~125 MB | Primary (EN) | | zipformer-en-20M-mobile | ~105 MB | Low-end fallback | | zipformer-en-303M | ~303 MB | High-accuracy | ### TTS Models | Model | Size | Priority | |---|---|---| | Kokoro-int8 (11 speakers) | ~143 MB | Primary | | VITS LJSpeech | ~106 MB | Fast fallback | | PocketTTS-int8 (voice cloning) | ~96 MB | Future voice packs | ### Metrics - **Latency:** Generation time (ms) for 1s/3s/5s utterances - **RAM:** Resident set size during active STT/TTS - **CPU:** % utilization, E-Core vs P-Core impact - **GPU:** ONNX provider overhead (CUDA/DirectML vs CPU) - **Gaming impact:** FPS delta while iRacing runs ## Acceptance Criteria - [ ] Benchmark report with tables per model/metric - [ ] Recommendation for default model selection - [ ] FPS impact during active STT (PTT) and TTS (callout) - [ ] E-Core vs P-Core affinity recommendation ## Non-goals - Voice cloning quality benchmarks - Non-English model benchmarks (deferred to localization) - VAD/Keyword Spotting benchmarks (small models, negligible)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Max/strategy-desktop#17
No description provided.