A totally offline use of Whisper ASR and LLaMA-2 GPT Model
Nowadays, no person shall be surprised by running a deep learning model within the cloud. However the situation might be far more complicated in the sting or consumer device world. There are several reasons for that. First, the usage of cloud APIs requires devices to at all times be online. This shouldn’t be an issue for an internet service but could be a dealbreaker for the device that should be functional without Web access. Second, cloud APIs cost money, and customers likely is not going to be comfortable to pay one more subscription fee. Last but not least, after several years, the project could also be finished, API endpoints shall be shut down, and the expensive hardware will turn right into a brick. Which is of course not friendly for purchasers, the ecosystem, and the environment. That’s why I’m convinced that the end-user hardware needs to be fully functional offline, without extra costs or using the net APIs (well, it will possibly be optional but not mandatory).
In this text, I’ll show easy methods to run a LLaMA GPT model and automatic speech recognition (ASR) on a Raspberry Pi. That can allow us to ask Raspberry Pi questions and get answers. And as promised, all it will work fully offline.
Let’s get into it!
The code presented in this text is meant to work on the Raspberry Pi. But a lot of the methods (except the “display” part) will even work on a Windows, OSX, or Linux laptop. So, those readers who don’t have a Raspberry Pi can easily test the code with none problems.
Hardware
For this project, I shall be using a Raspberry Pi 4. It’s a single-board computer running Linux; it’s small and requires only 5V DC power without fans and energetic cooling:
A more recent 2023 model, the Raspberry Pi 5, needs to be even higher; in accordance with benchmarks, it’s almost 2x faster. But additionally it is almost 50% dearer, and for our test, the model 4 is sweet enough.