Link parkin’: mlx-whisper
Speech recognition with Whisper in MLX. Whisper is a set of open source speech recognition models from OpenAI, ranging from 39 million to 1.5 billion parameters.
I had been experimenting with transcription and diarization using WhisperX. This turned out to be pretty slow on an M2 MacBook. mlx-whisper is pretty honking fast, although it only does transcription. I think diarization can be addressed by complementary application of pyannote.audio.
pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.
Thought I’d mentioned mlx-whisper ahead of parakeet-mlx. In any event, I’ve actually put it to the test a little bit for retrocast. The processing rate is quite acceptable for high-quality transcription. However, this needs some serious benchmarking to confirm. moonshine is also in the mix.