When LLMs give us outputs that reveal flaws in human society, can we elect to hearken to what they tell us?

By now, I’m sure most of you will have heard the news about Google’s latest LLM*, Gemini, generating pictures of racially diverse people in Nazi uniforms. This little news blip jogged my memory of something that I’ve been intending to discuss, which is when models have blind spots, so we apply expert rules to the predictions they generate to avoid returning something wildly outlandish to the user.
This type of thing shouldn’t be that unusual in machine learning, in my experience, especially when you will have flawed or limited training data. A superb example of this that I remember from my very own work was predicting when a package was going to be delivered to a business office. Mathematically, our model can be excellent at estimating exactly when the package would get physically near the office, but sometimes, truck drivers arrive at destinations late at night after which rest of their truck or in a hotel until morning. Why? Because nobody’s within the office to receive/sign for the package outside of business hours.
Teaching a model in regards to the idea of “business hours” could be very difficult, and the much easier solution was simply to say, “If the model says the delivery will arrive outside business hours, add enough time to the prediction that it changes to the following hour the office is listed as open.” Easy! It solves the issue and it reflects the actual circumstances on the bottom. We’re just giving the model just a little boost to assist its results work higher.
Nonetheless, this does cause some issues. For one thing, now we have now two different model predictions to administer. We will’t just throw away the unique model prediction, because that’s what we use for model performance monitoring and metrics. You may’t assess a model on predictions after humans got their paws in there, that’s not mathematically sound. But to get a transparent sense of the actual world model impact, you do want to have a look at the post-rule prediction, because that’s what the shopper actually experienced/saw in your application. In ML, we’re used to a quite simple framing, where each time you run a model you get one result or set of results, and that’s that, but while you start tweaking the outcomes before you allow them to go, then it’s worthwhile to think at a special scale.
I type of suspect that it is a type of what’s happening with LLMs like Gemini. Nonetheless, as a substitute of a post-prediction rule, it seems that the smart money says Gemini and other models are applying “secret” prompt augmentations to attempt to change the outcomes the LLMs produce.
In essence, without this nudging, the model will produce results which are reflective of the content it has been trained on. That’s to say, the content produced by real people. Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether or not they are things we’re pleased with or not. A model given all of the media available in our contemporary society goes to get an entire lot of exposure to racism, sexism, and myriad other types of discrimination and inequality, to say nothing of violence, war, and other horrors. While the model is learning what people seem like, and the way they sound, and what they are saying, and the way they move, it’s learning the warts-and-all version.
Our social media posts, our history books, our museum paintings, our popular songs, our Hollywood movies, etc. The model takes in all that stuff, and it learns the underlying patterns in it, whether or not they are things we’re pleased with or not.
Which means that in case you ask the underlying model to indicate you a physician, it’s going to probably be a white guy in a lab coat. This isn’t just random, it’s because in our modern society white men have disproportionate access to high status professions like being doctors, because they on average have access to more and higher education, financial resources, mentorship, social privilege, and so forth. The model is reflecting back at us a picture that will make us uncomfortable because we don’t prefer to take into consideration that reality.
The apparent argument is, “Well, we don’t want the model to strengthen the biases our society already has, we wish it to enhance representation of underrepresented populations.” I sympathize with this argument, quite quite a bit, and I care about representation in our media. Nonetheless, there’s an issue.
It’s impossible that applying these tweaks goes to be a sustainable solution. Recall back to the story I began with about Gemini. It’s like playing whac-a-mole, since the work never stops — now we’ve got people of color being shown in Nazi uniforms, and that is understandably deeply offensive to a lot of folks. So, perhaps where we began by randomly applying “as a black person” or “as an indigenous person” to our prompts, we have now so as to add something more to make it exclude cases where it’s inappropriate — but how do you phrase that, in a way an LLM can understand? We probably need to return to the start, and take into consideration how the unique fix works, and revisit the entire approach. In one of the best case, applying a tweak like this fixes one narrow issue with outputs, while potentially creating more.
Let’s play out one other very real example. What if we add to the prompt, “Never use explicit or profane language in your replies, including [list of bad words here]”. Possibly that works for a whole lot of cases, and the model will refuse to say bad words that a 13 yr old boy is requesting to be funny. But in the end, this has unexpected additional negative effects. What about if someone’s in search of the history of Sussex, England? Alternately, someone’s going to provide you with a nasty word you unnoticed of the list, in order that’s going to be constant work to keep up. What about bad words in other languages? Who judges what goes on the list? I even have a headache just desirous about it.
That is just two examples, and I’m sure you may consider more such scenarios. It’s like putting band aid patches on a leaky pipe, and each time you patch one spot one other leak springs up.
So, what’s it we actually want from LLMs? Do we wish them to generate a highly realistic mirror image of what human beings are literally like and the way our human society actually looks from the angle of our media? Or do we wish a sanitized version that cleans up the sides?
Truthfully, I feel we probably need something in the center, and we have now to proceed to renegotiate the boundaries, though it’s hard. We don’t want LLMs to reflect the actual horrors and sewers of violence, hate, and more that human society incorporates, that could be a a part of our world that mustn’t be amplified even barely. Zero content moderation shouldn’t be the reply. Fortunately, this motivation aligns with the desires of huge corporate entities running these models to be popular with the general public and make a lot of money.
…we have now to proceed to renegotiate the boundaries, though it’s hard. We don’t want LLMs to reflect the actual horrors and sewers of violence, hate, and more that human society incorporates, that could be a a part of our world that mustn’t be amplified even barely. Zero content moderation shouldn’t be the reply.
Nonetheless, I do wish to proceed to make a delicate case for the indisputable fact that we may also learn something from this dilemma on this planet of LLMs. As a substitute of simply being offended and blaming the technology when a model generates a bunch of images of a white male doctor, we should always pause to grasp why that’s what we received from the model. After which we should always debate thoughtfully about whether the response from the model needs to be allowed, and make a call that’s founded in our values and principles, and check out to hold it out to one of the best of our ability.
As I’ve said before, an LLM isn’t an alien from one other universe, it’s us. It’s trained on the things we wrote/said/filmed/recorded/did. If we wish our model to indicate us doctors of varied sexes, genders, races, etc, we want to make a society that permits all those different kinds of individuals to have access to that occupation and the education it requires. If we’re worrying about how the model mirrors us, but not taking to heart the indisputable fact that it’s us that should be higher, not only the model, then we’re missing the purpose.
If we wish our model to indicate us doctors of varied sexes, genders, races, etc, we want to make a society that permits all those different kinds of individuals to have access to that occupation and the education it requires.