How to start with Kaldi and Speech Recognition

Kaldi is an open source toolkit made for dealing with speech data. it’s being used in voice-related applications mostly for speech recognition but also for other tasks — like speaker recognition and speaker diarisation. The toolkit is already pretty old (around 7 years old) but is still constantly updated and further developed by a pretty large community. Kaldi is widely adopted both in Academia (400+ citations in 2015) and industry.

Kaldi is written mainly in C/C++, but the toolkit is wrapped with Bash and Python scripts. For basic usage this wrapping spares the need to get in too deep in the source code. Over the course of the last 5 months I learned about the toolkit and about using it. The goal of this article is to guide you through that process and give you the materials that helped me the most. See it as a shortcut.

You can read the rest of the article at Medium