Microsoft recently released a research paper titled: Sparks of Artificial General Intelligence: Early experiments with GPT-4. As described by Microsoft:
This paper
On this paper, there may be conclusive evidence demonstrating that GPT-4 goes far beyond memorization, and that it has a deep and versatile understanding of concepts, skills, and domains. In facts it’s ability to generalize far exceeds that of any human alive today.
While we’ve previously discussed the advantages of AGI, we should always quickly summarize the final consensus of what an AGI system is. In essence an AGI is a variety of advanced AI that may generalize across multiple domains and isn’t narrow in scope. Examples of narrow AI include an autonomous vehicle, a chatbot, a chess bot, or some other AI which is designed for a single purpose.
An AGI as compared would have the option to flexibly alternate between any of the above or some other field of experience. It’s an AI that may make the most of nascent algorithms similar to transfer learning, and evolutionary learning, while also exploiting legacy algorithms similar to deep reinforcement learning.
The above description of AGI matches my personal experience with using GPT-4, in addition to the evidence shared in research paper that was released by Microsoft.
One in all the prompts outlined within the paper is for GPT-4 to put in writing a proof of the infinitude of primes in the shape of a poem.
If we analyze the necessities for creating such a poem we realize that it requires mathematical reasoning, poetic expression, and natural language generation. This can be a challenge that may exceed the common capability of most humans.
The paper wanted to know if GPT-4 was simply producing content based on general memorization versus understanding context and with the ability to reason. When asked to recreate a poem within the form of Shakespeare it was in a position to achieve this. This requires a multifaceted level of understanding that far exceeds the power of the final population and includes theory of mind and mathematical genius.
The right way to Calculate GPT-4 Intelligence?
The query then becomes how can we measure the intelligence of an LLM? And is GPT-4 displaying behaviors of true learning or mere memorization?
The present way of testing an AI system is by evaluating the system on a set of normal benchmark datasets, and to be certain that they’re independent of the training data and that they cover a spread of tasks and domains. Any such testing is almost not possible because of the nearly unlimited quantity of knowledge that GPT-4 was trained on.
The paper then delves into the territory of generating novel and difficult tasks and questions that convincingly reveal that GPT-4 goes far beyond memorization, and that it has a deep and versatile understanding of concepts, skills, and domains.
When it comes intelligence, GPT-4 can generate short stories, screenplays, and it could actually calculate essentially the most complicated formulas.
GPT-4 can be in a position to code at a really high level, each by way of writing code from instructions and understanding existing code. GPT-4 can handle a wide selection of coding tasks, from
coding challenges to real world applications, from low-level assembly to high-level frameworks, from easy data structures to complex programs similar to games. GPT-4 may also reason about code execution, simulate the consequences of instructions, and explain the ends in natural language. GPT-4 may even execute pseudocode,
GPT-4 can offer rational responses to almost any prompt given. Actually, it’s intelligence is sort of invincible in its complexity.
GPT-4 Limitations
The paper then continues to explain how they probed GPT-4’s responses and behaviors, to confirm its consistency, coherence, and correctness, and to uncover its limitations and biases.
Essentially the most obvious limitation is its tendency to hallucinate. Here is an example of a hallucination that’s given within the paper where GPT-4 was confident in its answer which was completely false.
It seems that even GPT-4 understands this current limitation with hallucinations as this was the response it gave after I asked it what’s a GPT hallucination:
A GPT hallucination refers to a phenomenon where a Generative Pre-trained Transformer (GPT) model, just like the one you’re currently interacting with, produces a response that isn’t based on factual information or isn’t coherent with the context provided. These hallucinations occur when the model generates text that could seem plausible but is definitely incorrect, misleading, or nonsensical.
In other words if you train a LLM AI on the world’s data, how do you avoid a system learning incorrect data? A big language model learning and regurgitating misinformation, and conspiracy theories could actually be certainly one of the most important pitfalls and threats that humanity faces with large scale adoption of LLMs. This might actually be certainly one of the larger threats from AGI, one which is surprisingly missed when discussing the risks of AGI.
GPT-4 Proofs of Intelligence
The paper illustrates that it didn’t matter what variety of complex prompts were directed towards it, GPT-4 would exceed expectations. As stated within the paper:
Its unparalleled mastery of natural language. It could possibly not only generate fluent and coherent text, but additionally understand and manipulate it in various ways, similar to summarizing, translating, or answering an especially broad set of questions. Furthermore, by translating we mean not only between different natural languages but additionally translations in tone and magnificence, in addition to across domains similar to medicine, law, accounting, computer programming, music, and more.
Mock technical reviews got to GPT-4, it easily passed meaning on this context if this was a human on the opposite end that they’d immediately be hired as a software engineer. An identical preliminary test of GPT-4’s competency on the Multistate Bar Exam showed an accuracy above 70%. Which means that in the longer term we could automate lots of the tasks which can be currently given to lawyers. Actually there are some startups which can be now working to create robot lawyers using GPT-4.
Producing Recent Knowledge
One in all the arguments within the paper is that the one thing left for GPT-4 to prove true levels of understanding is for it to supply latest knowledge, similar to proving latest mathematical theorems, a feat that currently stays out of reach for LLMs.
On the other hand that is the holy grail of an AGI. While there are dangers with an AGI being controlled within the unsuitable hands, the advantages of an AGI with the ability to quickly analyze all historical data to find latest theorems, cures and coverings is almost infinite.
An AGI might be the missing link towards finding cures for rare genetic diseases which currently lack private industry funding, towards curing cancer once and for all, and to maximise the efficiency of renewable power to remove our dependency on unsustainable energy. Actually it could solve any consequential problem that’s fed into the AGI system. That is what Sam Altman and and the team at OpenAI understand, an AGI is really the last invention that is required to resolve most problems and to profit humanity.
After all that doesn’t solve the nuclear button problem of who controls the AGI, and what their intentions are. Regardless this paper does an outstanding job arguing that GPT-4 is a step forward towards achieving the dream AI researchers have had since 1956, when the initial Dartmouth Summer Research Project on Artificial Intelligence summer workshop was first launched.
While it’s debatable if GPT-4 is an AGI, it could easily be argued that for the primary time in human history it’s an AI system that may pass the Turing Test.