There’s also a chat version. The models can be found on the Hugging Face hub:
Falcon 180B is totally free and state-of-the-art. However it’s also an enormous model.
Can it run in your computer?
Unless your computer is prepared for very intensive computing, it could’t run Falcon 180B out-of-the-box. You’ll need to upgrade your computer and use a quantized version of the model.
In this text, I explain how you possibly can run Falcon-180B on consumer hardware. We are going to see that it could be reasonably reasonably priced to run a 180 billion parameter model on a contemporary computer. I also discuss several techniques that help reduce the hardware requirements.
The very first thing it’s essential know is that Falcon 180B has 180 billion parameters stored as bfloat16. A (b)float16 parameter is 2 bytes in memory.
If you load a model, the usual Pytorch pipeline works like this:
- An empty model is created: 180B parameters * 2 bytes = 360 GB