home ¦ Archives ¦ Atom ¦ RSS

mlx-whisper

Link parkin’: mlx-whisper

Speech recognition with Whisper in MLX. Whisper is a set of open source speech recognition models from OpenAI, ranging from 39 million to 1.5 billion parameters.

I had been experimenting with transcription and diarization using WhisperX. This turned out to be pretty slow on an M2 MacBook. mlx-whisper is pretty honking fast, although it only does transcription. I think diarization can be addressed by complementary application of pyannote.audio.

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

Thought I’d mentioned mlx-whisper ahead of parakeet-mlx. In any event, I’ve actually put it to the test a little bit for retrocast. The processing rate is quite acceptable for high-quality transcription. However, this needs some serious benchmarking to confirm. moonshine is also in the mix.

© 2008-2025 C. Ross Jam. Licensed under CC BY-NC-SA 4.0 Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.