Member-only story

Optimising Whisper — Finetune Parameters for Enhanced Performance

5 min readOct 25, 2024

When working with Whisper for automatic speech recognition (ASR), tuning the model parameters significantly affects the accuracy, coherence, and speed of transcriptions. Understanding the effect of each parameter allows you to optimize the performance of Whisper to best suit your needs. Here is a detailed overview of the critical parameters and how adjusting them can improve Whisper’s performance for different audio scenarios.

1. `chunk_length_s`: Controlling Chunk Size

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    chunk_length_s=12,  # Adjust this based on your needs
    stride_length_s=2,
    batch_size=8,
    torch_dtype=torch_dtype,
    device=device,
)

Explanation:

chunk_length_s determines the length of each audio chunk (e.g., 10-30 seconds) processed independently.
Shorter chunks (e.g., 10–12 seconds) make memory usage manageable, especially for smaller devices, but they may struggle to maintain context in longer conversations.
Longer chunks (e.g., 25–30 seconds) provide better context by capturing more of the conversation in one go. However, they require more…

Optimising Whisper — Finetune Parameters for Enhanced Performance

1. `chunk_length_s`: Controlling Chunk Size

Written by Sravanth

No responses yet

Optimising Whisper — Finetune Parameters for Enhanced Performance

1. chunk_length_s: Controlling Chunk Size

Written by Sravanth

No responses yet

1. `chunk_length_s`: Controlling Chunk Size