
Large Language Models (LLMs) and Generative AI, resembling GPT engines, have been creating big waves within the AI domain recently, and there’s a giant hype available in the market, each amongst retail individuals and corporates, to ride this recent tech wave. Nevertheless, as this technology is rapidly taking on multiple use cases available in the market, we’d like to pay more attention to the safety features of it and listen in greater detail to the chance related to its usage, especially the open-source LLMs.
In recent research conducted by Rezilion, a renowned automated software supply chain security platform, experts have investigated this exact issue, and the findings will surprise us. They considered all of the projects that fit these criteria:
- Projects should have been created eight months ago or less (approx November 2022, to June 2023, on the time of this paper’s publication)
- Projects are related to the topics: LLM, ChatGPT, Open-AI, GPT-3.5, or GPT-4
- Projects should have a minimum of 3,000 stars on GitHub.
These criteria have ensured that every one the main projects come under the research.
To articulate their research, they’ve used a framework called OpenSSF Scorecard. Scorecard is a SAST tool created by the Open Source Security Foundation (OSSF). Its goal is to evaluate the safety of open-source projects and help improve them. The assessment relies on different facts in regards to the repository, resembling its variety of vulnerabilities, how often it’s being maintained, if it incorporates binary files and lots of more.
The aim of all of the checks together is to make sure adherence to security best practices and industry standards. Each check has a risk level related to it. The danger level represents the estimated risk related to not adhering to a selected best practice and adds weight to the rating accordingly.
Currently, 18 checks could be divided into three themes: holistic security practices, source code risk assessment, and construct process risk assessment. The OpenSSF Scorecard assigns an ordinal rating between 0 to 10 and a risk level rating for every check.
It seems that nearly all of those LLMs (open-sourced) and projects cope with major security concerns, which the experts have categorized as follows:
1.Trust boundary risk
Risks resembling inadequate sandboxing, unauthorized code execution, SSRF vulnerabilities, insufficient access controls, and even prompt injections fall under the overall concept of trust boundaries.
Anyone can inject any malicious nlp masked command, which may cross multiple channels and severely affect the complete software chain.
One in all the favored examples is CVE-2023-29374 Vulnerability in LangChain (third hottest open source gpt)
2. Data management Risk
Data leakage and training data poisoning fall under the info management risks category. These risks pertain to any machine-learning system and are usually not just restricted to Large Language Models.
Training data poisoning refers to deliberately manipulating an LLM’s training data or fine-tuning procedures by an attacker to introduce vulnerabilities, backdoors, or biases that may undermine the model’s security, effectiveness, or ethical behavior. This malicious act goals to compromise the integrity and reliability of the LLM by injecting misleading or harmful information throughout the training process.
3. Inherent Model Risk
These security concerns occur because of the limitation of the underlying ML model: inadequate AI alignment and overreliance on LLM-generated content.
4. Basic Security Best Practices
It consists of issues resembling improper error handling or insufficient access controls that fall under general security best practices. They’re common to not machine learning models generally and never specifically to LLMs.
The astonishing and concerning fact is the safety rating all these models have received. The average rating among the many checked projects was just 4.6 out of 10, the typical age was 3.77 months, and the typical variety of stars was 15,909. The projects that gain popularity comparatively quickly are rather more in danger than those built over an extended period.
The corporate has not only highlighted the safety issues these projects are coping with right away but has also extensively suggested the steps of their research that could be taken to mitigate these risks and make them safer within the longer run.
In conclusion, the corporate has highlighted the necessity for security protocols to be properly administered and ensured, has highlighted the particular security weak points, and has suggested changes that could be done to eradicate such risks. By taking comprehensive risk assessments and robust security measures, organizations can harness the facility of open-source LLMs while protecting sensitive information and maintaining a secure environment.
Don’t forget to affix our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more. If you will have any questions regarding the above article or if we missed anything, be at liberty to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
References:
- https://www.darkreading.com/tech-trends/open-source-llm-project-insecure-risky-use
- https://info.rezilion.com/explaining-the-risk-exploring-the-large-language-models-open-source-security-landscape
Anant is a Computer science engineer currently working as an information scientist with experience in Finance and AI products as a service. He’s keen to construct AI-powered solutions that create higher data points and solve every day life problems in an impactful and efficient way.