Home Artificial Intelligence GPT — Intuitively and Exhaustively Explained A Transient Introduction to Transformers

GPT — Intuitively and Exhaustively Explained A Transient Introduction to Transformers

0
GPT — Intuitively and Exhaustively Explained
A Transient Introduction to Transformers

Natural Language Processing | Machine Learning | Chat GPT

Exploring the architecture of OpenAI’s Generative Pre-trained Transformers.

Towards Data Science
“Mixture Expert” by the creator using MidJourney. All images by the creator unless otherwise specified.

In this text we’ll be exploring the evolution of OpenAI’s GPT models. We’ll briefly cover the transformer, describe variations of the transformer which result in the primary GPT model, then we’ll undergo GPT1, GPT2, GPT3, and GPT4 to construct an entire conceptual understanding of the cutting-edge.

Who’s this handy for? Anyone focused on natural language processing (NLP), or innovative AI advancements.

How advanced is that this post? This isn’t a fancy post, it’s mostly conceptual. That said, there are a variety of concepts, so it could be daunting to less experienced data scientists.

Pre-requisites: I’ll briefly cover transformers in this text, but you’ll be able to discuss with my dedicated article on the topic for more information.

Before we get into GPT I would like to briefly go over the transformer. In its most elementary sense, the transformer is an encoder-decoder style model.

A transformer working in a translation task. The input (I’m a manager) is compressed to some abstract representation that encodes the meaning of your complete input. The decoder works recurrently, by feeding into itself, to construct the output. From my article on transformers

The encoder converts an input into an abstract representation which the decoder uses to iteratively generate output.

high level representation of how the output of the encoder pertains to the decoder. the decoder references the encoded input for each recursive loop of the output. From my article on transformers

each the encoder and decoder employ an abstract representations of text which is created using multi headed self attention.

LEAVE A REPLY

Please enter your comment!
Please enter your name here