How Well Do LLMs Comply with the EU AI Act?
As generative artificial intelligence (AI) stays on center stage, there may be a growing call for regulating this technology since it might probably negatively impact a big population quickly. Impacts could take the shape of discrimination, perpetuating stereotypes, privacy violations, negative biases, and undermine basic human values.
In June 2023, the US government announced a set of voluntary AI guidelines that several outstanding corporations agreed to follow — including Anthropic, Meta (Facebook), Google, Amazon, OpenAI, and Microsoft to call a number of.[1] That is an excellent step for the US, but unfortunately, it has all the time lagged behind the European Union in AI regulations. In my previous post Generative AI Ethics: Key Considerations within the Age of Autonomous Content, I explored the EU AI Ethics Framework and provided a set of considerations for implementing the framework when large language models (LLMs) are used. This blog focuses on the draft EU AI Act and the way well LLMs abide by the draft laws.
In June 2023, the EU passed the worlds first draft regulation on AI. Constructing upon the AI ethics framework ratified in 2019, the EU’s priority is to ensure that AI systems utilized in the EU are “secure, transparent, traceable, non-discriminatory, and environmentally friendly.”[2] To avoid detrimental consequences, the EU framework insists that humans remain involved in AI systems. In other words, corporations can’t simply let AI and automation run itself.
The proposed law segments AI into three different categories depending on the danger they could pose to people — each risk level requires a special level of regulation. If this plan is accepted, it might be the primary set of AI regulations on the earth. The three risk tiers identified by the EU are: unacceptable risk, high risk, and limited risk.
- Unacceptable risk: using technology that’s harmful and poses a threat to human beings might be prohibited. Such examples could include the cognitive influence of people or certain vulnerable classes; rating of individuals based on their social standing, and using facial recognition for real-time surveillance and distant identity identification en masse. Now, everyone knows militaries all over the world are focused on autonomous weapons, but I digress.
- High risk: AI systems that would have a deleterious effect on safety or basic rights and freedoms are classified into two different categories by the EU. The primary category is AI that’s embedded in retail products that currently falls under the EU’s product safety regulations. This includes toys, airplanes, automobiles, medical equipment, elevators, and so forth. The second category will have to be registered in an EU database. This includes technology like biometrics, critical infrastructure operations, training and education, employment-related activities, policing, border control, and legal evaluation of the law.
- Limited risk: on the very least, low risk systems must meet standards for transparency and openness that might give people the prospect to make knowledgeable decisions. The EU stipulates that users needs to be notified each time they’re engaging with AI. In addition they require that models needs to be created in a way so that they don’t create illegal material. In addition they require that model makers disclose what (if any) copyrighted material was utilized in their training.
The EU AI Act will next have to be negotiated amongst member countries so that they can vote on the ultimate type of the law. The EU is targeting the tip of the 12 months (2023) for ratification.
Now, let’s turn to how current LLMs adhere to the draft act.
Researchers at Stanford University’s Center for Research on Foundation Models (CRFM) and Institute for Human-Centered Artifical Intelligence (HAI) recently published a paper titled Do Foundation Models Comply with the Draft EU AI Act? They extracted twenty-two requirements from the act, categorized them, after which created a 5-point rubric for twelve of the twenty-two requirements. The entire research, including criteria, rubrics, and scores can be found on GitHub under the MIT License.
The research team mapped the legislative requirements into the categories seen in Table 1.1. Now, it needs to be noted that the team only evaluated twelve of the twenty-two total requirements identified. In the long run, the team chosen the twelve requirements that were most easily assessable based on publicly available data and documentation provided by the model makers.
Table 1.1: LLM Compliance Table Summary
For individuals who many not bear in mind, the Stanford team has also painstakingly cataloged over 100 LLM datasets, models, and applications which will be found on their ecosystem graphs. To make things manageable, the researchers analyzed “10 foundation model providers–and their flagship foundation models–with 12 of the Act’s requirements for foundation models based on their [our] rubrics.”[3]
The researchers checked out models from OpenAI, Anthropic, Google, Meta, Stability.ai, and others. Based on their evaluation, their research resulted in the next scorecard.
Figure 1.2: Grading Foundation Model Providers’ Compliance with the Draft EU AI Act
Overall, the researchers noted that there was quite a little bit of variability in model compliance across providers (and these were only twelve of the twenty-two requirements) with “some providers rating lower than 25% (AI21 Labs, Aleph Alpha, Anthropic) and just one provider scores at the least 75% (Hugging Face/BigScience) at present.”[4]
I’d encourage you to read the complete study, however the researchers stated that there may be considerable room for improvement across all providers. In addition they identified several key ‘persistent challenges’ which include:
● Ambiguous Copyright Issues: Most of the muse models were trained on data sourced from the web, with a significant slice likely protected by copyright. Nonetheless, most providers don’t make clear the copyright status of the training data. The legal implications of using and reproducing copyrighted data, particularly when considering licensing terms, aren’t well-defined and are currently under energetic litigation in america (see Washington Post — AI learned from their work. Now they need compensation. Reuters — US judge finds flaws in artists’ lawsuit against AI corporations). We’ll must see how this plays out over time.
● Lack of Risk Mitigation Disclosure: As mentioned within the intro, AI has the potential to negatively impact many individuals rapidly, so understanding LLM risks is critical. Nonetheless, nearly all of the muse model providers ignore the danger disclosures identified within the draft laws. Although many providers list risks, only a few detail the steps they’ve taken to mitigate identified risks. Although not a generative AI case, there may be a recent lawsuit against US health insurer Cigna Healthcare, which alleges that they used AI to disclaim payments (Axios — AI lawsuits spread to health). Bill Gates penned an excellent article titled The risks of AI are real but manageable, which I encourage you to read.
● Evaluation and Auditing Deficit: There’s a dearth of consistent benchmarks for evaluating the performance of foundation models, especially in areas like potential misuse or model robustness. The US’ CHIPS and Science Act has put forth a mandate for the National Institute of Standards and Technology (NIST) to create standardized evaluations for AI models. The power to guage and monitor models was the main target of my GenAIOps framework that I recently discussed. Eventually, we’ll see GenAIOps, DataOps, and DevOps come together under a standard framework, but we’re a ways off.
● Inconsistent Energy Consumption Reports: I believe lots of us have experienced the recent heat waves internationally. For LLMs, foundation model providers are quite varied on the subject of reporting energy use and related emissions. Actually, the researchers cite other research that means that we don’t even know find out how to measure and account for energy usage yet. Nnlabs.org reported the next “In accordance with OpenAI, GPT-2, which has 1.5 billion parameters, required 355 years of single-processor computing time and consumed 28,000 kWh of energy to coach. As compared, GPT-3, which has 175 billion parameters, required 355 years of single-processor computing time and consumed 284,000 kWh of energy to coach, which is 10 times more energy than GPT-2. BERT, which has 340 million parameters, required 4 days of coaching on 64 TPUs and consumed 1,536 kWh of energy.”[5]
Along with the above, there are various other issues to contend with when implementing generative AI together with your organization.
Based on the research, there continues to be an extended road ahead for providers and adopters of generative AI technologies. Law makers, system designers, governments, and organizations must work together to handle these necessary issues. As a start line, we will be certain that that we’re transparent in our design, implementation, and use of AI systems. For regulated industries, this could possibly be a challenge as LLMs oftentimes have billions of parameters. Billions! How can something be interpretable and transparent with that many aspects? These systems must have clear, unambiguous documentation and we’d like to respect mental property rights. To support environmental, social, and company governance (ESG), we may even must design a standardized framework for measuring and reporting energy consumption. Most significantly, AI systems have to be secure, respect privacy, and uphold human values. We want to take a human-centered approach to AI.