mlx-whisper

Posted on: Sun 21 December 2025

Speech recognition with Whisper in MLX. Whisper is a set of open source speech recognition models from OpenAI, ranging from 39 million to 1.5 billion parameters.

I had been experimenting with transcription and diarization using WhisperX. This turned out to be pretty slow on an M2 MacBook. mlx-whisper is pretty honking fast, although it only does transcription. I think diarization can be addressed by complementary application of pyannote.audio.

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

Thought I’d mentioned mlx-whisper ahead of parakeet-mlx. In any event, I’ve actually put it to the test a little bit for retrocast. The processing rate is quite acceptable for high-quality transcription. However, this needs some serious benchmarking to confirm. moonshine is also in the mix.