
Conformal Prediction, LLMs and HuggingFace β Part 1

Large Language Models (LLM) are all the thrill without delay. They’re used for a wide range of tasks, including text classification, query answering, and text generation. On this tutorial, we are going to show tips on how to conformalize a transformer language model for text classification using ConformalPrediction.jl
.
Specifically, we’re enthusiastic about the duty of intent classification as illustrated within the sketch below. Firstly, we feed a customer query into an LLM to generate embeddings. Next, we train a classifier to match these embeddings to possible intents. After all, for this supervised learning problem we’d like training data consisting of inputs β queries β and outputs β labels indicating the true intent. Finally, we apply Conformal Predition to quantify the predictive uncertainty of our classifier.
Conformal Prediction (CP) is a rapidly emerging methodology for Predictive Uncertainty Quantification. Should youβre unfamiliar with CP, it’s possible you’ll wish to first take a look at my 3-part introductory series on the subject starting with this post.
We are going to use the Banking77 dataset (Casanueva et al., 2020), which consists of 13,083 queries from 77 intents related to banking. On the model side, we are going to use the DistilRoBERTa model, which is a distilled version of RoBERTa (Liu et al., 2019) fine-tuned on the Banking77 dataset.
The model may be loaded from HF straight into our running Julia session using the Transformers.jl
package.
This package makes working with HF models remarkably easy in Julia. Kudos to the devs! π
Below we load the tokenizer tkr
and the model mod
. The tokenizer is used to convert the text right into a sequence of integers, which is then fed into the model. The model outputs a hidden state, which is then fed right into a classifier to get the logits for every class. Finally, the logits are then passed through a softmax function to get the corresponding predicted probabilities. Below we run a number of queries through the model to see the way it performs.
# Load model from HF π€:
tkr =β¦