Home Artificial Intelligence A Gentle Introduction to Bayesian Deep Learning Limitations of Traditional Deep Learning Short Recap of Bayesian Statistics Introduction to Bayesian Deep Learning Benefits of Bayesian Deep Learning Limitations of Bayesian Deep Learning References

A Gentle Introduction to Bayesian Deep Learning Limitations of Traditional Deep Learning Short Recap of Bayesian Statistics Introduction to Bayesian Deep Learning Benefits of Bayesian Deep Learning Limitations of Bayesian Deep Learning References

0
A Gentle Introduction to Bayesian Deep Learning
Limitations of Traditional Deep Learning
Short Recap of Bayesian Statistics
Introduction to Bayesian Deep Learning
Benefits of Bayesian Deep Learning
Limitations of Bayesian Deep Learning
References

Welcome to the exciting world of Probabilistic Programming! This text is a delicate introduction to the sector, you simply need a basic understanding of Deep Learning and Bayesian statistics.

Towards Data Science

By the top of this text, you need to have a basic understanding of the sector, its applications, and the way it differs from more traditional deep learning methods.

If, like me, you’ve got heard of Bayesian Deep Learning, and also you guess it involves bayesian statistics, but you do not know exactly the way it is used, you’re in the precise place.

One among the primary limitation of Traditional deep learning is that despite the fact that they’re very powerful tools, they don’t provide a measure of their uncertainty.

Chat GPT can say false information with blatant confidence. Classifiers output probabilities which are often not calibrated.

Uncertainty estimation is an important aspect of decision-making processes, especially within the areas equivalent to healthcare, self-driving cars. We wish a model to find a way to find a way to estimate when its very unsure about classifying a subject with a brain cancer, and on this case we require further diagnosis by a medical examiner. Similarly we would like autonomous cars to find a way to decelerate when it identifies a brand new environment.

For example how bad a neural network can estimates the danger, let’s have a look at a quite simple Classifier Neural Network with a softmax layer in the long run.

The softmax has a really comprehensible name, it’s a Soft Max function, meaning that it’s a “smoother” version of a max function. The rationale for that’s that if we had picked a “hard” max function just taking the category with the best probability, we might have a zero gradient to all the opposite classes.

With a softmax, the probability of a category might be near 1, but never exactly 1. And since the sum of probabilities of all classes is 1, there remains to be some gradient flowing to the opposite classes.

Hard max vs Soft Max, Image by Creator

Nevertheless, the softmax function also presents a problem. It outputs probabilities which are poorly calibrated. Small changes within the values before applying the softmax function are squashed by the exponential, causing minimal changes to the output probabilities.

This often leads to overconfidence, with the model giving high probabilities for certain classes even within the face of uncertainty, a characteristic inherent to the ‘max’ nature of the softmax function.

Comparing a standard Neural Network (NN) with a Bayesian Neural Network (BNN) can highlight the importance of uncertainty estimation. A BNN’s certainty is high when it encounters familiar distributions from training data, but as we move away from known distributions, the uncertainty increases, providing a more realistic estimation.

Here’s what an estimation of uncertainty can seem like:

Traditional NN vs Bayesian NN, Image by Creator

You’ll be able to see that once we are near the distribution now we have observed during training, the model may be very certain, but as we move farther from the known distribution, the uncertainty increases.

There may be one central Theorem to know in Bayesian statistics: The Bayes Theorem.

Bayes Theorem, Image by Creator
  • The prior is the distribution of theta we predict is the most definitely before any statement. For a coin toss for instance we could assume that the probability of getting a head is a gaussian around p = 0.5
  • If we would like to place as little inductive bias as possible, we could also say p is uniform between [0,1].
  • The likelihood is given a parameter theta, how likely is that we got our observations X, Y
  • The marginal likelihood is the likelihood integrated over all theta possible. It known as “marginal” because we marginalized theta by averaging it over all probabilities.

The important thing idea to know in Bayesian Statistics is that you just start from a previous, it is your best guess of what the parameter might be (it’s a distribution). And with the observations you make, you adjust your guess, and also you obtain a posterior distribution.

Note that the prior and posterior usually are not a punctual estimations of theta but a probability distribution.

For example this:

Image by writer

On this image you may see that the prior is shifted to the precise, however the likelihood rebalances our prior to the left, and the posterior is somewhere in between.

Bayesian Deep Learning is an approach that marries two powerful mathematical theories: Bayesian statistics and Deep Learning.

The essential distinction from traditional Deep Learning resides within the treatment of the model’s weights:

In traditional Deep Learning, we train a model from scratch, we randomly initialize a set of weights, and train the model until it converges to a brand new set of parameters. We learn a single set of weights.

Conversely, Bayesian Deep Learning adopts a more dynamic approach. We start with a previous belief concerning the weights, often assuming they follow a traditional distribution. As we expose our model to data, we adjust this belief, thus updating the posterior distribution of the weights. In essence, we learn a probability distribution over the weights, as a substitute of a single set.

During inference, we average predictions from all models, weighting their contributions based on the posterior. This implies, if a set of weights is extremely probable, its corresponding prediction is given more weight.

Let’s formalize all of that:

Inference, Image from Creator

Inference in Bayesian Deep Learning integrates over all potential values of theta (weights) using the posterior distribution.

We can even see that in Bayesian Statistics, integrals are in every single place. This is definitely the principal limitation of the Bayesian framework. These integrals are often intractable (we do not at all times know a primitive of the posterior). So now we have to do very computationally expensive approximations.

Advantage 1: Uncertainty estimation

  • Arguably essentially the most distinguished good thing about Bayesian Deep Learning is its capability for uncertainty estimation. In lots of domains including healthcare, autonomous driving, language models, computer vision, and quantitative finance, the power to quantify uncertainty is crucial for making informed decisions and managing risk.

Advantage 2: Improved training efficiency

  • Closely tied to the concept of uncertainty estimation is improved training efficiency. Since Bayesian models are aware of their very own uncertainty, they will prioritize learning from data points where the uncertainty — and hence, potential for learning — is highest. This approach, often called Lively Learning, results in impressively effective and efficient training.
Demonstration of the effectiveness of Lively Learning, Image from Creator

As demonstrated within the graph below, a Bayesian Neural Network using Lively Learning achieves 98% accuracy with just 1,000 training images. In contrast, models that don’t exploit uncertainty estimation are likely to learn at a slower pace.

Advantage 3: Inductive Bias

One other advantage of Bayesian Deep Learning is the effective use of inductive bias through priors. The priors allow us to encode our initial beliefs or assumptions concerning the model parameters, which might be particularly useful in scenarios where domain knowledge exists.

Consider generative AI, where the thought is to create recent data (like medical images) that resemble the training data. For instance, in case you’re generating brain images, and also you already know the final layout of a brain — white matter inside, grey matter outside — this data might be included in your prior. This implies you may assign the next probability to the presence of white matter in the middle of the image, and gray matter towards the perimeters.

In essence, Bayesian Deep Learning not only empowers models to learn from data but in addition enables them to begin learning from a degree of data, slightly than ranging from scratch. This makes it a potent tool for a big selection of applications.

White Matter and Gray Matter, Image by Creator

Plainly Bayesian Deep Learning is incredible! So why is it that this field is so underrated? Indeed we frequently speak about Generative AI, Chat GPT, SAM, or more traditional neural networks, but we almost never hear about Bayesian Deep Learning, why is that?

Limitation 1: Bayesian Deep Learning is slooooow

The important thing to know Bayesian Deep Learning is that we “average” the predictions of the model, and every time there’s a median, there’s an integral over the set of parameters.

But computing an integral is commonly intractable, which means there isn’t a closed or explicit form that makes the computation of this integral quick. So we will’t compute it directly, now we have to approximate the integral by sampling some points, and this makes the inference very slow.

Imagine that for every data point x now we have to average out the prediction of 10,000 models, and that every prediction can take 1s to run, we find yourself with a model that shouldn’t be scalable with a considerable amount of data.

In a lot of the business cases, we’d like fast and scalable inference, this is the reason Bayesian Deep Learning shouldn’t be so popular.

Limitation 2: Approximation Errors

In Bayesian Deep Learning, it’s often essential to make use of approximate methods, equivalent to Variational Inference, to compute the posterior distribution of weights. These approximations can result in errors in the ultimate model. The standard of the approximation is determined by the alternative of the variational family and the divergence measure, which might be difficult to decide on and tune properly.

Limitation 3: Increased Model Complexity and Interpretability

While Bayesian methods offer improved measures of uncertainty, this comes at the price of increased model complexity. BNNs might be difficult to interpret because as a substitute of a single set of weights, we now have a distribution over possible weights. This complexity might result in challenges in explaining the model’s decisions, especially in fields where interpretability is essential.

There may be a growing interest for XAI (Explainable AI), and Traditional Deep Neural Networks are already difficult to interpret since it is difficult to make sense of the weights, Bayesian Deep Learning is even more difficult.

Whether you’ve got feedback, ideas to share, wanna work with me, or just wish to say hello, please fill out the shape below, and let’s start a conversation.

Say Hello 🌿

Don’t hesitate to go away a clap or follow me for more!

  1. Ghahramani, Z. (2015). Probabilistic machine learning and artificial intelligence. Nature, 521(7553), 452–459. Link
  2. Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424. Link
  3. Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050–1059). Link
  4. Louizos, C., Welling, M., & Kingma, D. P. (2017). Learning sparse neural networks through L0 regularization. arXiv preprint arXiv:1712.01312. Link
  5. Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Springer Science & Business Media. Link

LEAVE A REPLY

Please enter your comment!
Please enter your name here