There are thousands of academic papers on Arxiv, so which ones should you read? I read hundreds of GANs papers while researching for my book and below are the 12 most influential papers (from 2014 to 2019) I found. There aren’t that many breakthrough GANs papers after 2019. Click the names and images to go to source.
- Generative Adversarial Networks. The very first paper of GAN written by Ian GoodFellow et al in 2014. This paper describes GAN’s architecture that consists of generator and disciminator. It also provide mathematical derivation of adversarial loss.
- Auto-Encoding Variational Bayes. Variational autoencoder (VAE) showing encoding high dimensional pixels into small dimensional space. Many advanced GANs uses VAE as encoder.
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. DCGAN, established CNN architectures in the generator and the discriminator. Also demonstrate the use of vector arithmetic for latent space interpolation/exploration.
- Wasserstein GAN. This paper proves mathematically why GANs training is unstable. The Wasserstein loss is not widely used later but its approach of analysing GANs with mathematical rigour using Lipschitz constraint inspired innovations to make training GAN easier.
- Conditional Generative Adversarial Nets. Earlier GANs generate images from random noise alone. This paper shows how to encode the class labels into embedding and use that to generate samples from desired class labels.
- Image-to-Image Translation with Conditional Adversarial Networks. Pix2pix. The first image-to-image GAN that caught public attention including sketch-to-cat application. It also popularizes the use of PatchGAN (Precomputed real-time texture synthesis with markovian generative adversarial networks https://arxiv.org/abs/1604.04382) in discriminator to increase fidelity of generated images.
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. CycleGAN allows unpaired image translation, most famous for its use to convert horse into zebra.
- A Neural Algorithm of Artistic Style. Neural style transfer to convert photos into artistic painting. To me, this is the most underrated paper in image generation. This paper led the research in disentanglement where it separates the image into style and content. This eventually lead to creation of StyleGAN.
- Progressive Growing of GANs for Improved Quality, Stability, and Variation. ProgressiveGAN. The first high fidelity portrait generation to 1024x1024 resolution by growing the layer progressively.
- A Style-Based Generator Architecture for Generative Adversarial Networks. StyleGAN. The paper incorporates style disentanglement into ProgressiveGAN to become SOTA in face generation. Its successor StyleGAN2 increases the image quality very subtly, and its main improvement lies in improved computational efficiency.
- Spectral Normalization for Generative Adversarial Networks. Spectral normalization is an important technique to stabilize GANs training by limiting the weights growth. This is used practically in every GANs now.
- Self-Attention Generative Adversarial Networks. Self-Attention GAN (SAGAN). Transformer displaced RNN and LSTM in Natural Language Processing (NLP) and is making waves in computer vision. SAGAN introduces self-attention (transformer) into GAN to capture the great variation of different image classes.
This list isn’t exhaustive but they are the important papers to prepare you to understand the state-of-the-art researchers. My prediction for 2021 (date of this article) is that where will be widespread use of transformer and fusing of language e.g. text-to-image like OpenAI’s DALL-E model.
Hope you enjoy reading this article. If you’re interest in implementing these models, then you can find these information in the book “Hands-on Image Generation with TensorFlow”. You can read the overview in https://soon-yau.medium.com/learn-and-master-ai-for-image-generation-423978e2f95f?sk=7ddc810a5f86021bc79792bf6af2eaed