Home Artificial Intelligence The History of Open-Source LLMs: Higher Base Models (Part Two) Early Days of Open-Source LLMs

The History of Open-Source LLMs: Higher Base Models (Part Two) Early Days of Open-Source LLMs

0
The History of Open-Source LLMs: Higher Base Models (Part Two)
Early Days of Open-Source LLMs

How LLaMA, MPT, Falcon, and LLaMA-2 put open-source LLMs on the map…

Towards Data Science
(Photo by Iñaki del Olmo on Unsplash)

Open-source research on large language models (LLMs) is incredibly priceless, because it goals to democratize a strong and influential technology. Although open-source LLMs at the moment are commonly used and widely studied, this area of research saw some initial struggles that were difficult to beat. Namely, open-source LLMs performed poorly at first and were heavily criticized. Inside this overview, we are going to study a line of research that modified this narrative by making high-performing pre-trained LLMs available to everyone. On condition that pre-training a language model is so expensive, the models we are going to study listed here are especially impactful. After these high-performing base models were created and released, many individuals could conduct research using these models at marginal added cost.

“The capabilities of LLMs are remarkable considering the seemingly straightforward nature of the training methodology.” — from [14]

The present series. This overview is an element two of a 3 part series on the history of open-source LLMs. The primary part within the series overviewed initial attempts at creating open-source LLMs. Here, we are going to study the preferred open-source base models (i.e., language models which were pre-trained but not fine-tuned or aligned) which are currently available. Next time, we are going to go over how these models might be fine-tuned or aligned to create quite a lot of useful applications.

(from [10, 12, 14, 15])

Partially one among this series, we saw that the early days of research on open-source LLMs resulted within the proposal of several necessary base models, similar to OPT and BLOOM. Nevertheless, these models were widely considered to perform quite poorly in comparison with closed-source pre-trained models (e.g., GPT-3). How can we solve this? First, we want to take a deeper have a look at the LLM training process.

Training pipeline. LLMs are trained in several steps, as shown within the figure below. First, we pre-train the model…

LEAVE A REPLY

Please enter your comment!
Please enter your name here