An empty result is shown when it encounters OOV(out of vocabulary) words. I tested it by using non-strip and stripped audios with both large and small size models but it did not do well.
#HOW TO USE CEPSTRAL VOICES RASPBEERY PI OFFLINE#
Offline recognition, provides lightweight tflite models for low-resource devices. A Python library called librosa provides some functions for doing that. The blog mentions that we can eliminate the silence part of an audio recording according to the short-term energy of audio data. I was inspired by a blog Audio Handling Basics: Process Audio Files In Command-Line or Python | Hacker Noon. But we need to re-train the model when we have new commands coming in. This is a demo Speech Command Recognition with torchaudio - PyTorch Tutorials which is done by PyTorch Official. But at least the command audios are short in time and we can find ways to eliminate the silence and extract MFCC(Mel-frequency Cepstral Coefficients) feature. The bad thing is that the calculation is time-consuming. This method does not need training and is also applicable even if you want to add new commands. We can pick the audio with the lowest cost. DTW can calculate the cost to match one piece of audio with template audio. This belongs to the second category and it's similar to template matching. Use acoustic features to do analysis and detect commands.
![how to use cepstral voices raspbeery pi how to use cepstral voices raspbeery pi](https://www.mdpi.com/electronics/electronics-10-02697/article_deploy/html/images/electronics-10-02697-g004.png)
One good thing is that this can be combined with NLP applications but this is an overkill for Speech2Text. And then look up the commands in the text.
![how to use cepstral voices raspbeery pi how to use cepstral voices raspbeery pi](https://d3i71xaburhd42.cloudfront.net/a26953f402e6b4999657802c805251b51fcb5878/3-Table1-1.png)
So I used Sounddevice and Soundfile instead.įrom a functional point of view, the methods to do this can be divided into: I used PyAudio at the beginning, but it is an old library. The conclusion is that I use VAD(Voice Activity Detection) + DTW + Vosk Voice Control Petoi with Raspberry Pi Model 3 A+