It’s been not more than a 12 months now, where GPT stardust ✨ covered almost any sector globally. Increasingly experts, from any field, crave to utilise Large Language Models (LLM) in an effort to optimise their workflow. Evidently, the company world couldn’t be absent from this recent trend’s safari. The longer term guarantees unprecedented possibilities, yet wrapped within the suited… cost.
The scope of this project is to exhibit an end-to-end solution for leveraging LLMs, in a way that mitigates the privacy and price concerns. We’ll utilise LLMWare, an open-source framework for industrial-grade enterprise LLM apps development, the Retrieval Augmented Generation (RAG) method [1], and the BLING — a newly introduced collection of open-source small models, solely run on CPU.
Concept
After successfully predicting Jrue Holiday’s 🏀 transfer to Milwaukee Bucks, Data Corp took on a brand new project: assisting a FinTech SME to optimise its decision-making with AI. That’s, to construct a tool that can manipulate the hundreds of thousands(!) of proprietary docs, query state-of-the-art GPT like models and supply Managers with concise, optimal information. That’s all thoroughly, nevertheless it comes with two major pitfalls:
- Security: Querying a business LLM model (i.e. GPT-4) essentially means sharing proprietary information over the web (how about all those hundreds of thousands of docs?). A knowledge breach would compromise the firm’s integrity of course.
- Cost: An automatic tool just like the above will foster the Managers’ productivity, but there isn’t a free lunch. The anticipated day by day queries might count as much as a whole bunch and given the ‘GPU-thirsty’ LLMs, the aggregated cost might easily get uncontrolled.
The above limitations led me to a difficult alternative:
How about developing a custom tool that can eat proprietary knowledge and…