How well can AI models solve (and create) rebus puzzles?
What does it mean for an AI to be creative?
Last 12 months, I wrote an article about measuring creativity in Large Language Models (LLMs) using several word-based creativity tests.
Since then, AI has developed rapidly and is able to processing and creating each text and image. These models, sometimes known as “Multimodal Large Language Models” (MLLMs), are extremely powerful and have advanced abilities to know complex textual and visual inputs.
In this text, I explore one strategy to measure creativity in two popular MLLMs: OpenAI’s GPT-4 Vision and Google’s Gemini Pro Vision. I take advantage of rebus puzzles, that are word puzzles that require combining each visual and language cues to resolve.
Creativity is incredibly multi-faceted and difficult to define as a single trait. Subsequently, in this text, I aim to not measure creativity typically, but to guage one very specific aspect of creativity.
Note [modified from my earlier article]: These experiments aim to not measure how creative AI models are, but reasonably to measure the extent of creative process present of their model generations. I’m not claiming that AI models possess creative considering in the identical way humans do. Relatively, I aim to point out how the models reply to particular measures of creative processes.
A rebus puzzle is an image representation of common words or phrases. They often involve a mixture of visual and spatial cues. For instance, below are six examples of rebus puzzles (answers are at the top of the article).