A brand new development in large language models has emerged with the discharge of OpenLLaMA, an open-source reproduction of Meta AI’s LLaMA model. The creators of OpenLLaMA have made the permissively licensed model publicly available as a 7B OpenLLaMA model that has been trained with 200 billion tokens. The discharge includes PyTorch and Jax weights of pre-trained OpenLLaMA models, evaluation results, and a comparison against the unique LLaMA models. This development has significant implications for machine learning, particularly for researchers who require large language models but face challenges accessing proprietary models.
The creators of OpenLLaMA have shared details on how they trained their models on the RedPajama dataset, which is a reproduction of the LLaMA training dataset containing over 1.2 trillion tokens. They followed the identical preprocessing and training hyperparameters as the unique LLaMA paper, including model architecture, context length, training steps, learning rate schedule, and optimizer. The one difference between their approach and the unique one is the dataset used: OpenLLaMA employs the RedPajama dataset reasonably than the one utilized by the unique LLaMA.
The models were trained on cloud TPU-v4s using EasyLM, a JAX-based training pipeline developed for training and fine-tuning language models. They employed a mixture of normal data parallelism and fully sharded data parallelism (also often known as ZeRO stage 3) to balance the training throughput and memory usage. Overall, their training run achieved a throughput of over 1900 tokens/second / TPU-v4 chip.
The performance of OpenLLaMA was evaluated on several tasks using the lm-evaluation-harness. The outcomes were compared against the unique LLaMA model and GPT-J, a 6B parameter model trained on the Pile dataset by EleutherAI. The evaluation metrics for the unique LLaMA model were generated by running it on the identical tasks. The outcomes for the LLaMA model barely differed from those reported in the unique LLaMA paper, which could also be as a result of differences in evaluation protocols. Nonetheless, OpenLLaMA exhibited comparable or higher performance than the unique LLaMA and GPT-J across most tasks, in keeping with the presented results. Although OpenLLaMA was trained on 200 billion tokens as an alternative of the 1 trillion tokens used for the unique LLaMA and 500 billion tokens used for GPT-J, its performance is anticipated to enhance even further upon completing its training on 1 trillion tokens.
To encourage feedback and collaboration from the community, the team behind OpenLLaMA has released a preview checkpoint of their weights. These weights can be found in two formats: an EasyLM format to be used with their EasyLM framework and a PyTorch format to be used with the Huggingface transformers library. Unlike the unique LLaMA model, OpenLLaMA’s tokenizer and weights are trained entirely from scratch, so obtaining the unique LLaMA tokenizer and weights isn’t any longer essential. Nonetheless, it is important to notice that OpenLLaMA uses the BOS (starting of a sentence) token (id=1) during training, so this token needs to be prepended for optimal performance during a few-shot evaluation. The preview checkpoint weights and EasyLM framework are permissively under the Apache 2.0 license. The team is currently focused on completing the training process on the complete RedPajama dataset to permit for an apple-to-apple comparison between the unique LLaMA and OpenLLaMA. Moreover, they’re working on training a smaller 3B model for low-resource use cases. The team plans to release more updates soon.
Try the Github Link. Don’t forget to affix our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more. If you could have any questions regarding the above article or if we missed anything, be happy to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Niharika
” data-medium-file=”https://www.marktechpost.com/wp-content/uploads/2023/01/1674480782181-Niharika-Singh-264×300.jpg” data-large-file=”https://www.marktechpost.com/wp-content/uploads/2023/01/1674480782181-Niharika-Singh-902×1024.jpg”>
Niharika is a Technical consulting intern at Marktechpost. She is a 3rd 12 months undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the most recent developments in these fields.