Here’s Everything You Need To Know About Variational Autoencoders

March 7, 2024 | By Hemant Kashyap

VAEs are an artificial neural network architecture to generate new data which consist of an encoder and decoder.

Table of Contents

What Are Variational Autoencoders (VAEs?)
How Do Variational Autoencoders Work?
How Are Variational Autoencoders Used In GenAI?
What Are The Advantages & Disadvantages Of Variational Autoencoders?

What Are Variational Autoencoders (VAEs?)

Variational Autoencoders (VAEs) are an artificial neural network architecture to generate new data. They are similar to regular autoencoders, which consist of an encoder and decoder. The encoder takes input data and compresses it into a latent representation. Meanwhile, the decoder tries to reconstruct the original data from this compressed representation.

However, VAEs differ from regular autoencoders in two important aspects:

Goal: While regular autoencoders aim to minimise the reconstruction error between the input and the reconstructed data, VAEs have an additional goal of learning a latent space that captures the underlying probability distribution of the data. This allows them to generate new data samples that resemble the original data.
Latent Representation: Unlike regular autoencoders that represent the latent space as a single point, VAEs represent it as a probability distribution by introducing randomness in the encoding process. This ensures a more robust and diverse latent space.

How Do Variational Autoencoders Work?

Variational autoencoders (VAEs) work through a combination of encoding, sampling, decoding and loss function optimisation:

Encoding:
- The encoder, a neural network, takes an input data point (for example, an image) and compresses it into a latent representation.
- Unlike regular autoencoders that output a single point, VAEs aim to capture the underlying probability distribution of the data.
- This is achieved by the encoder outputting the mean and variance of a probability distribution, typically a standard normal distribution.
Sampling:
- A sample is then drawn from the encoded probability distribution. This sample represents a point in the latent space that captures the essential features of the original data.
- The randomness introduced in this step allows for exploring different variations within the learned distribution.
Decoding:
- The decoder, another neural network, takes the sampled point from the latent space as input.
- Its goal is to reconstruct the original data point based on the information contained in the sampled latent representation.
Loss Function Optimisation: The VAE is trained by minimising a loss function that combines two terms:
- Reconstruction Loss: This measures the difference between the original data and the reconstructed data by the decoder. It encourages the model to learn an accurate representation of the data.
- KL Divergence: This term measures the difference between the encoded distribution (represented by mean and variance) and a prior distribution (usually a standard normal distribution). Minimising this term encourages the encoded distribution to be “closer” to the prior, promoting smoothness and interpretability in the latent space.

The VAE learns to encode data efficiently while capturing its essential features and maintaining a smooth and interpretable latent space by minimising reconstruction loss and KL divergence.

This also allows VAEs to generate new data samples by drawing samples from the latent space and feeding them through the decoder. These generated samples will share characteristics with the training data. Further, VAEs can explore the latent space to understand the underlying relationships and variations within the data.

How Are Variational Autoencoders Used In GenAI?

This artificial neural network architecture plays a significant role in the field of GenAI due to its ability to learn the underlying structure and probability distribution of data and generate new samples that resemble the original data. The following are some of the important applications of VAEs within the realm of GenAI:

Image Generation: VAEs excel at generating realistic and diverse images, including:
New Faces: VAEs can be trained on a dataset of faces and then generate new, never-before-seen faces that retain the characteristics of the training data.
Landscapes: Similarly, VAEs can be used to generate novel landscapes with various features like mountains, rivers, and vegetation.
Editing & Manipulating Existing Images: By modifying the latent representation in the VAE, users can manipulate existing images.
Data Augmentation: GenAI often requires large and diverse datasets for training. VAEs can be used to artificially increase the size and diversity of existing datasets by generating new, realistic samples that share the same statistical properties as the original data. This is crucial for improving the performance of various machine-learning models.
Anomaly Detection: VAEs can be employed to identify data points that deviate significantly from the learned distribution. This capability makes them valuable for anomaly detection tasks in various domains such as:
Fraud Detection: Identifying fraudulent transactions in financial data.
Equipment Failure Prediction: Detecting anomalies in sensor data that might indicate potential equipment failure.
Medical Anomaly Detection: Identifying unusual patterns in medical images, potentially indicating abnormalities.

What Are The Advantages & Disadvantages Of Variational Autoencoders?

As is the case with every artificial neural network architecture, variational autoencoders also have their benefits and drawbacks:

Advantages Of Variational Autoencoders (VAEs):

Generative Capabilities: VAEs excel at generating new data samples that resemble the training data, making them valuable for applications like image synthesis, music composition, and data augmentation.
Latent Space Manipulation: VAEs explicitly model a latent space, which captures the underlying factors or features of the data. This allows for precise control and manipulation of data features, fostering interpretability and customisation in various tasks.
Unsupervised Learning: Variational autoencoders can be trained on unlabelled data, making them suitable for scenarios where labelled data is scarce or expensive to obtain. This is useful in domains like anomaly detection and exploring new datasets.
Probabilistic Formulation: The probabilistic nature of VAEs allows for greater flexibility and control in data generation compared to deterministic methods. This enables a more diverse and realistic sample generation.

Disadvantages Of Variational Autoencoders (VAEs):

Training Challenges: Training VAEs can be more challenging compared to some other models due to the inherent complexity of the objective function and potential issues like mode collapse, where the model gets stuck generating only a few types of data.
Computational Cost: Training VAEs, especially with complex architectures, can be computationally expensive compared to simpler models.
Reconstruction Quality: While VAEs can generate new data, the reconstructed data might not always be as high-fidelity as the original data, especially for complex data like high-resolution images.
Latent Space Interpretability: While VAEs offer a latent space, interpreting its specific dimensions and their relation to the data can be challenging, limiting the interpretability in certain situations.