For the reason that introduction of OpenAI’s revolutionary ChatGPT, which smashed records by gaining the fastest 100 million users for a product, considerable advancements have been made in the sphere of natural language conversation agents. Researchers are actively exploring various techniques and techniques to boost chatbot models’ capabilities, allowing them to create more natural and fascinating interactions with their users. Consequently, several open-source and light-weight alternatives to ChatGPT have been released available in the market, with one such alternative being the ChatGLM model series developed by researchers at Tsinghua University, China. This series, which relies on the General Language Model (GLM) framework, differs from the Generative Pre-trained Transformer (GPT) group of LLMs, that are more commonly seen. The series includes several bilingual models trained in Chinese and English, of which probably the most well-known is ChatGLM-6B, which has 6.2 billion parameters. The model has been pre-trained on over 1 trillion English and Chinese tokens and has been further fine-tuned for Chinese question-answering, summarization, and conversational tasks using techniques like reinforcement learning with human feedback.
One other standout feature of ChatGLM-6B is that it will probably be deployed locally and requires only a few resources as a result of its quantization techniques. The model may even be deployed locally on consumer-grade graphics cards. It has change into exceptionally popular, particularly in China, with over 2 million downloads worldwide, making it probably the most influential large-scale open-source models. Consequently of its widespread adoption, Tsinghua University researchers released ChatGLM2-6B, the second-generation version of the bilingual chat model. ChatGLM2-6B includes all of the strengths of the first-generation model in addition to several latest features which have been added, comparable to performance improvements, support for longer contexts, and more efficient inference. Moreover, the research team has prolonged the usage of model weights beyond academic purposes (as done previously), making them available for business use.
As a start line, the researchers have elevated the bottom model of ChatGLM2-6B as in comparison with the first-generation version. ChatGLM2-6B uses the hybrid objective function of GLM and has been pre-trained with over 1.4 trillion English and Chinese tokens. The researchers evaluated the performance of their model against other competitive models of roughly the identical size available in the market. It was revealed that ChatGLM2-6B achieves noticeable performance improvements on various datasets like MMLU, CEval, BBH, etc. One other impressive upgrade demonstrated by ChatGLM2-6B is the support for longer contexts, from 2K within the previous version to 32K. The FlashAttention algorithm has been instrumental on this by speeding up attention and reducing memory consumption for even longer sequences for the eye layer. Furthermore, the model has been trained with a context length of 8K through the dialogue alignment to supply users more conversational depth. ChatGLM2-6B also uses the Multi-Query Attention technique, thereby successfully achieving lower GPU memory usage of the KV Cache and increased inference speed, roughly 42%, in comparison with the primary generation.
The researchers at Tsinghua University have open-sourced ChatGLM2-6B in hopes of encouraging developers and researchers worldwide to advertise the expansion and innovation of LLMs and develop several useful applications based on the model. Nevertheless, the researchers also highlight the indisputable fact that given the smaller scale of the model, its decisions can often be influenced by randomness, and thus, its outputs have to be fastidiously fact-checked for accuracy. In the case of future work, the team has thought one step ahead and has began working on the third version of the model, ChatGLM3.
Take a look at the Github Link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to affix our 18k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is obsessed with the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more concerning the technical field by participating in several challenges.