Natural Language Processing | Machine Learning | Chat GPT
Exploring the architecture of OpenAI’s Generative Pre-trained Transformers.
In this text we’ll be exploring the evolution of OpenAI’s GPT models. We’ll briefly cover the transformer, describe variations of the transformer which result in the primary GPT model, then we’ll undergo GPT1, GPT2, GPT3, and GPT4 to construct an entire conceptual understanding of the cutting-edge.
Who’s this handy for? Anyone focused on natural language processing (NLP), or innovative AI advancements.
How advanced is that this post? This isn’t a fancy post, it’s mostly conceptual. That said, there are a variety of concepts, so it could be daunting to less experienced data scientists.
Pre-requisites: I’ll briefly cover transformers in this text, but you’ll be able to discuss with my dedicated article on the topic for more information.
Before we get into GPT I would like to briefly go over the transformer. In its most elementary sense, the transformer is an encoder-decoder style model.
The encoder converts an input into an abstract representation which the decoder uses to iteratively generate output.
each the encoder and decoder employ an abstract representations of text which is created using multi headed self attention.