Home Artificial Intelligence Where Are All of the Women?

Where Are All of the Women?

0
Where Are All of the Women?

Exploring large language models’ biases in historical knowledge

Towards Data Science
A number of of the highest historical figures mentioned essentially the most often by the GPT-4 and Claude. Individual images sourced from Wikipedia. Collage created by the creator.

Large language models (LLMs) equivalent to ChatGPT are being increasingly utilized in educational and skilled settings. It will be important to know and study the various biases present in such models before integrating them into existing applications and our every day lives.

One in every of the biases I studied in my previous article was regarding historical events. I probed LLMs to know what historical knowledge they encoded in the shape of major historical events. I discovered that they encoded a serious Western bias towards understanding major historical events.

On an analogous vein, in this text, I probe language models regarding their understanding of essential historical figures. I asked two LLMs who a very powerful historical people in history were. I repeated this process 10 times for 10 different languages. Some names, like Gandhi and Jesus, appeared extremely ceaselessly. Other names, like Marie Curie or Cleopatra, appeared less ceaselessly. In comparison with the variety of male names generated by the models, there have been extremely few female names.

The largest query I had was: Where were all the ladies?

Continuing the theme of evaluating historical biases encoded by language models, I probed OpenAI’s GPT-4 and Anthropic’s Claude regarding major historical figures. In this text, I show how each models contain:

  • Gender bias: Each models disproportionately predict male historical figures. GPT-4 generated the names of female historical figures 5.4% of the time and Claude did so 1.8% of the time. This pattern held across all 10 languages.
  • Geographic bias: Whatever the language the model was prompted in, there was a bias towards predicting Western historical figures. GPT-4 generated historical figures from Europe 60% of the time and Claude did so 52% of the time.
  • Language bias: Certain languages suffered from gender or geographic biases more. For instance, when prompted in Russian, each GPT-4 and Claude generated zero women across all of my experiments. Moreover, language quality was lower for some languages. For instance, when prompted in Arabic, the models were more prone to respond incorrectly by generating…

LEAVE A REPLY

Please enter your comment!
Please enter your name here