Experimenting with Large Language Models without cost
Everybody knows that enormous language models are, by definition, large. And even not so way back, they were available just for high-end hardware owners, or a minimum of for individuals who paid for cloud access and even every API call. Nowadays, the time is changing. In this text, I’ll show the best way to run a LangChain Python library, a FAISS vector database, and a Mistral-7B model in Google Colab completely without cost, and we are going to do some fun experiments with it.
Components
There are numerous articles here on TDS about using large language models in Python, but often it is just not really easy to breed them. For instance, many examples of using a LangChain library use an OpenAI class, the primary parameter of which (guess what?) is OPENAI_API_KEY. Another examples of RAG (Retrieval Augmented Generation) and vector databases use Weaviate; the very first thing we see after opening their website is “Pricing.” Here, I’ll use a set of open-source libraries that could be used completely without cost:
- LangChain. It’s a Python framework for developing applications powered by language models. Additionally it is model-agnostic, and the identical code could be reused with different models.
- FAISS (Facebook AI Similarity Search). It’s a library designed for efficient similarity search and storage of dense vectors, which I’ll use for Retrieval Augmented Generation.
- Mistral 7B is a 7.3B parameter large language model (released under the Apache 2.0 license), which, based on the authors, is outperforming 13B Llama2 on all benchmarks. Additionally it is available on HuggingFace, so its use is pretty easy.
- Last but not least, Google Colab can also be a very important a part of this test. It provides free access to Python notebooks powered by CPU, 16 GB NVIDIA Tesla T4, and even 80 GB NVIDIA A100 (though I never saw the last one available for a free instance).
At once, let’s get into it.
Install
As a primary step, we’d like to open Google Colab and create a brand new notebook. The needed libraries could be installed through the use of pip
in the primary cell: