Back to Blog
NLP January 12, 2025 8 min read

Production Speech-to-Text with Whisper: Moroccan Arabic Dialect Support

Deploying OpenAI Whisper for multilingual transcription — model selection, performance optimizations, and fine-tuning for Moroccan Darija.

Model Selection

ModelParamsWER (EN)Speed
tiny39M14%32x
base74M10%16x
small244M7%6x
medium769M5%2x
large-v31.5B3%1x

For production with latency constraints: small is the sweet spot.

FastAPI Deployment

import whisper
from fastapi import FastAPI, UploadFile

app = FastAPI()
model = whisper.load_model('small').to('cuda')

@app.post('/transcribe')
async def transcribe(audio: UploadFile, language: str = 'ar'):
    audio_bytes = await audio.read()
    result = model.transcribe(audio_bytes, language=language, fp16=True)
    return {'text': result['text'], 'language': result['language']}

Darija Fine-Tuning Dataset

Used Mozilla Common Voice Arabic + scraped Moroccan radio recordings. Fine-tuning for 3 epochs reduces WER from 32% to 18% on Darija.

WhisperSpeech-to-TextArabicMoroccan DarijaAudio
O

Ossama Elhakki

AI Engineer & ML Systems Builder — Morocco