- Hands-On Generative Adversarial Networks with Keras
- Rafael Valle
- 440字
- 2025-04-04 14:27:55
Variational autoencoders
Auto-encoder models are used to model , the joint probability of
, the observed data, and
, the latent variable. The joint probability
is normally factorized as
. During inference, we are interested in finding good values of
to produce observed data – that is, we are interested in learning
, the posterior probability of
given
. Using Bayes' rule, we can rewrite the posterior as follows:

Close inspection of this equation reveals that computing the evidence, the marginal distribution of the data , is hardly possible and normally intractable. We first try to circumvent this barrier by computing an approximation of the evidence. We do it by using variational inference, and, instead, estimating the parameters of a known distribution
that is the least divergent from the posterior
. Variational inference approximates the posterior
with a family of distributions
, where the variational
parameter indexes the family of distributions. This can be done by minimizing the KL divergence between
and
, as described in the following equation:

Unfortunately, using does not circumvent the problem, and we are still faced with computing the evidence
. At this point, we give up on computing the exact evidence and focus on estimating an Evidence Lower Bound (ELBO). The ELBO sets the perfect scenario for Variational Autoencoders, and is computed by removing
of the previous equation and inverting the signs, giving:

VAEs consist of an encoder parametrized by
, and a decoder
parametrized by
. The encoder is trained to maximize the posterior probability of a
latent vector, given the
,
. data. The decoder is trained to maximize the probability of the data
given the latent representation
latent vector,
. Informally speaking, the encoder learns to compress the data into a latent representation, and the decoder learns to decompress the data from the latent representation. The VAE loss is defined as follows:

The first term represents the reconstruction loss, or the expectation of the negative probability. The second term is a regularize term that was derived in our problem setup.
Unlike autoregressive models, VAEs are normally easy to run in parallel during training and inference. Conversely, they are normally harder to optimize than autoregressive models.
Deep feature-consistent VAE is one of the best models for image generation using VAEs. The following figure shows the faces generated by the model. From a qualitative perspective, image samples produced with VAEs tend to be blurry.
