Generative Adversarial Networks
Data is a massively abundant, widely available resource today. So, the question arises, why is there still a need for data generation techniques or generative models? Most of the data available is unlabeled, and since supervised techniques require labelled data with can be impractical to acquire, there is a large use case for techniques that can model the distribution of the datasets available and provide information regarding them, while simultaneously offering techniques to develop larger sets of data samples from the learned distribution.
As the name suggests, Generative Adversarial Networks (GANs) are a type of generative model. Generative models try to model the joint probability of the input data and labels (P(x,y) ), contrary to discriminative models which just try to maps the relationship between input data and an output class label (P(y|x)). Generative models can thus be used for classification as well as creating new samples.
Hailed as one of the major advancements in deep learning in recent times, GAN’s were invented by Ian Goodfellow in 2014. GAN’s primarily consist of two deep adversarial neural networks, a discriminator and a generator. You can think of it as a game between counterfeiters and the police. On one end the counterfeiter (or the generator in this case) tries to create almost real currency in order to fool the police (or discriminator), whereas the police tries to become better at catching the fake or counterfeit items.
The discriminative model has the task of determining whether a given image is an image from the dataset or has been artificially created. The generator on the other hand, creates natural looking images that are similar to the original data distribution. It takes random noise as input, and transforms it through a series of deep learning layers, and finally outputs a sample of data with the goal to output diverse data samples from the true data distribution it learns through feedback from the discriminator. The loss function gradients of the discriminator are therefore back-propagated through the combined network to the generator. This can be thought of as a zero-sum or minimax two player game.
GAN’s provide the opportunity to use latent code for generation, cretaing asymptotically consistent results. Since their initial invention in 2014, GAN’s have been combined with various other techniques to provide realistic results on a number of image datasets. Various improvements have also been proposed to the initial prototype of GAN’s including (but not limited to) mini batch training, batch normalization, historical averaging, and one-sided label smoothing.
Getting acquainted with GAN’s using TensorFlow
To get an initial idea of what GAN’s are and how they are implemented in TensorFlow, I followed the blog by Agustinus Kristiadi. The code used can be found at http://wiseodd.github.io/techblog/2016/09/17/gan-tensorflow/. The blog implements a Vanilla GAN, using a basic deep net with one hidden layer as the discriminator, and a two layer deep net as the Generator. Running this code for 100,000 iterations required a short computation time due to the simplicity of the deep nets, and provided surprisingly accurate representations on the MNIST dataset. Sampled output during the first and last iterations of the GAN are depicted below.
Using improvements to GAN’s with more complex deep learning networks
In the Vanilla GAN used above, the generator and discriminator are equally powerful. It is sometimes advised while creating GAN’s to provide a deeper network for the discriminator than the generator. Hence, after getting acquainted with a simple setup for GAN’s, I decided to venture into more complex deep learning network architectures for discriminator and generators. Several problems can be encountered while creating GAN’s, chief among them is mode collapse, where the generator learns to produce samples with extremely low variety to trick the discriminator. Some techniques like using Leaky ReLU instead of ReLU, using tanh as the activation function in the final layer, batch normalization, etc have been successful in overcoming this challenge, while also speeding computation in some cases i.e. giving better results in lesser iterations.
The second GAN implemented for this post was using an article by Richard Kelley. It consists of a three layer deep neural network for both the discriminator and generator, with leaky ReLU as activation functions. Again, the images used are from the MNIST dataset. Some key points about the network include- learning rate decay and momentum adjustor based on pylearn2.
Some modifications were made to the network proposed by Richard Kelley, based on techniques and tricks I read on different blogs. The activation function for the final layer is tanh (instead of sigmoid which is originally used in the article). The dimensionality of the input noise is 100, as it was the dimensionality of choice by numerous blogs. Also, I tried using the AdamOptimizer instead of the MomentumOptimizer, since that was suggested by numerous blogs but found no substantial difference in results.
Since the initial model proposed faced the problem of mode collapse (after around 100 iterations the images generated had little or no variation), I introduced batch normalization before every layer of both the generator and discriminator. Due to the complex nature of the GAN and lack of computational resources, the GAN was only trained for 50,000 iterations (blog recommends 100,000 iterations). Due to this the results obtained were not substantial.
The output from this GAN is depicted below:
For completeness sake, here is a list of references used to complete this assignment: