Unlocking Cost-Efficiency in Large Compute Projects with Open Source LLMs and GPU Rentals.
Introduction
On the planet of enormous language models (LLMs), the fee of computation could be a significant barrier, especially for extensive projects. I recently launched into a project that required running 4,000,000 prompts with a median input length of 1000 tokens and a median output length of 200 tokens. That’s nearly 5 billion tokens! The normal approach of paying per token, as is common with models like GPT-3.5 and GPT-4, would have resulted in a hefty bill. Nonetheless, I discovered that by leveraging open source LLMs, I could shift the pricing model to pay per hour of compute time, resulting in substantial savings. This text will detail the approaches I took and compare and contrast each of them. Please note that while I share my experience with pricing, these are subject to vary and will vary depending in your region and specific circumstances. The important thing takeaway here is the potential cost savings when leveraging open source LLMs and renting a GPU per hour, reasonably than the particular prices quoted. In the event you plan on utilizing my really useful solutions in your project, I’ve left a few affiliate links at the top of this text.
ChatGPT API
I conducted an initial test using GPT-3.5 and GPT-4 on a small subset of my prompt input data. Each models demonstrated commendable performance, but GPT-4 consistently outperformed GPT-3.5 in a majority of the cases. To offer you a way of the fee, running all 4 million prompts using the Open AI API would look something like this:
While GPT-4 did offer some performance advantages, the fee was disproportionately high in comparison with the incremental performance it added to my outputs. Conversely, GPT-3.5 Turbo, although more cost-effective, fell short when it comes to performance, making noticeable errors on 2–3% of my prompt inputs. Given these aspects, I wasn’t prepared to speculate $7,600 on a project that was…