Home Artificial Intelligence Challenges of Detecting AI-Generated Text Table of contents Introduction Constructing an intuition for text source detection What’s the perplexity of a language model? Computing the perplexity of a language model’s prediction Detecting AI-generated text Misinformation What’s next? Conclusion

Challenges of Detecting AI-Generated Text Table of contents Introduction Constructing an intuition for text source detection What’s the perplexity of a language model? Computing the perplexity of a language model’s prediction Detecting AI-generated text Misinformation What’s next? Conclusion

0
Challenges of Detecting AI-Generated Text
Table of contents
Introduction
Constructing an intuition for text source detection
What’s the perplexity of a language model?
Computing the perplexity of a language model’s prediction
Detecting AI-generated text
Misinformation
What’s next?
Conclusion

We’ve all of the ingredients we’d like to examine if a bit of text is AI-generated. Here’s every little thing we’d like:

  1. The text (sentence or paragraph) we wish to examine.
  2. The tokenized version of this text, tokenized using the tokenizer that was used to tokenize the training dataset for this model.
  3. The trained language model.

Using 1, 2, and three above, we will compute the next:

  1. Per-token probability as predicted by the model.
  2. Per-token perplexity using the per-token probability.
  3. Total perplexity for your complete sentence.
  4. The perplexity of the model on the training dataset.

To envision if a text is AI-generated, we’d like to check the sentence perplexity with the model’s perplexity scaled by a fudge-factor, alpha. If the sentence perplexity is greater than the model’s perplexity with scaling, then it’s probably human-written text (i.e. not AI-generated). Otherwise, it’s probably AI-generated. The explanation for that is that we expect the model to not be perplexed by text it might generate itself, so if it encounters some text that it itself wouldn’t generate, then there’s reason to consider that the text isn’t AI-generated. If the perplexity of the sentence is lower than or equal to the model’s training perplexity with scaling, then it’s likely that it was generated using this language model, but we will’t be very sure. It’s because it’s possible for a human to have written that text, and it just happens to be something that the model could even have generated. In spite of everything, the model was trained on a number of human-written text so in some sense, the model represents an “average human’s writing”.

ppx(x) within the formula above means the perplexity of the input “x”.

Next, let’s take a take a look at examples of human-written v/s AI-generated text.

Examples of AI-generated v/s human written text

We’ve written some Python code that colours each token in a sentence based on its perplexity relative to the model’s perplexity. The primary token is at all times colored black if we don’t consider its perplexity. Tokens which have a perplexity that’s lower than or equal to the model’s perplexity with scaling are colored red, indicating that they could be AI-generated, whereas the tokens with higher perplexity are colored green, indicating that they were definitely not AI-generated.

The numbers within the square brackets before the sentence indicate the perplexity of the sentence as computed using the language model. Note that some words are part red and part blue. That is attributable to the indisputable fact that we used a subword tokenizer.

Here’s the code that generates the HTML above.

def get_html_for_token_perplexity(tok, sentence, tok_ppx, model_ppx):
tokens = tok.encode(sentence).tokens
ids = tok.encode(sentence).ids
cleaned_tokens = []
for word in tokens:
m = list(map(ord, word))
m = list(map(lambda x: x if x != 288 else ord(' '), m))
m = list(map(chr, m))
m = ''.join(m)
cleaned_tokens.append(m)
#
html = [
f"{cleaned_tokens[0]}",
]
for ct, ppx in zip(cleaned_tokens[1:], tok_ppx):
color = "black"
if ppx.item() >= 0:
if ppx.item() <= model_ppx * 1.1:
color = "red"
else:
color = "green"
#
#
html.append(f"{ct}")
#
return "".join(html)
#

As we will see from the examples above, if a model detects some text as human-generated, it’s definitely human-generated, but when it detects the text as AI-generated, there’s a likelihood that it’s not AI-generated. So why does this occur? Let’s have a look next!

False positives

Our language model is trained on a LOT of text written by humans. It’s generally hard to detect if something was written (digitally) by a selected person. The model’s inputs for training comprise many, many various sorts of writing, likely written by numerous people. This causes the model to learn many various writing styles and content. It’s very likely that your writing style very closely matches the writing kind of some text the model was trained on. That is the results of false positives and why the model can’t make sure that some text is AI-generated. Nevertheless, the model can make sure that some text was human-generated.

OpenAI: OpenAI recently announced that it might discontinue its tools for detecting AI-generated text, citing a low accuracy rate (Source: Hindustan Times).

The unique version of the AI classifier tool had certain limitations and inaccuracies from the outset. Users were required to input no less than 1,000 characters of text manually, which OpenAI then analyzed to categorise as either AI or human-written. Unfortunately, the tool’s performance fell short, because it properly identified only 26 percent of AI-generated content and mistakenly labeled human-written text as AI about 9 percent of the time.

Here’s the blog post from OpenAI. It looks as if they used a unique approach in comparison with the one mentioned in this text.

Our classifier is a language model fine-tuned on a dataset of pairs of human-written text and AI-written text on the identical topic. We collected this dataset from quite a lot of sources that we consider to be written by humans, comparable to the pretraining data and human demonstrations on prompts submitted to InstructGPT. We divided each text right into a prompt and a response. On these prompts, we generated responses from quite a lot of different language models trained by us and other organizations. For our web app, we adjust the arrogance threshold to maintain the false positive rate low; in other words, we only mark text as likely AI-written if the classifier may be very confident.

GPTZero: One other popular AI-generated text detection tool is GPTZero. It looks as if GPTZero uses perplexity and burstiness to detect AI-generated text. “Burstiness refers back to the phenomenon where certain words or phrases appear in bursts inside a text. In other words if a word appears once in a text, it’s prone to appear again in close proximity” (source).

GPTZero claims to have a really high success rate. In keeping with the GPTZero FAQ, “At a threshold of 0.88, 85% of AI documents are classified as AI, and 99% of human documents are classified as human.”

The generality of this approach

The approach mentioned in this text doesn’t generalize well. What we mean by that is that if you’ve 3 language models, for instance, GPT3, GPT3.5, and GPT4, then you have to run the input text through all of the 3 models and check perplexity on all of them to see if the text was generated by any considered one of them. It’s because each model generates text barely in a different way, and so they all must independently evaluate text to see if any of them can have generated the text.

With the proliferation of enormous language models on this planet as of August 2023, it seems unlikely that one can check any piece of text as having originated from any of the language models on this planet.

The truth is, recent models are being trained day by day, and attempting to sustain with this rapid progress seems hard at best.

The instance below shows the results of asking our model to predict if the sentences generated by ChatGPT are AI-generated or not. As you may see, the outcomes are mixed.

The sentences within the purple box are appropriately identified as AI-generated by our model, whereas the remainder are incorrectly identified as human written.

There are lots of the explanation why this may occasionally occur.

  1. Train corpus size: Our model is trained on little or no text, whereas ChatGPT was trained on terabytes of text.
  2. Data distribution: Our model is trained on a unique data distribution as in comparison with ChatGPT.
  3. Fantastic-tuning: Our model is only a GPT model, whereas ChatGPT was fine-tuned for chat-like responses, making it generate text in a rather different tone. In the event you had a model that generates legal text or medical advice, then our model would perform poorly on text generated by those models as well.
  4. Model size: Our model may be very small (lower than 100M parameters in comparison with > 200B parameters for ChatGPT-like models).

It’s clear that we’d like a greater approach if we hope to offer a fairly high-quality result to examine if any text is AI-generated.

Next, let’s take a take a look at some misinformation about this topic circulating across the web.

LEAVE A REPLY

Please enter your comment!
Please enter your name here