Writing StyleGAN from scratch with TensorFlow to edit face (and CycleGAN, GauGAN, BigGAN and many more)
Nowadays, we can’t escape from seeing social media posts showing how AI create amazing results. This include using deepfakes to swap face with actor in Hollywood films, transforming face into different gender or age, or more recently, to swap your pyjamas into suit for video conferencing call! This all looks like magic. Indeed, any sufficiently advanced technology is indistinguishable from magic.
These are made possible with recent advancement in deep learning, or more specifically deep neural networks known as Generative Adversarial Networks (GANs) and autoencoder. There are many free online resources teaching how to use GANs but they tend to use simple GANs on toy dataset such as MNIST. On the other hand, many researchers publish code for their state-of-the-art models but they were often written in an obscure way that optimized for model performance at the expense of readability. More importantly, at the time of writing, I could not find any teaching materials, be it textbooks or online courses that systematically teach the important techniques needed to understand these models. This motivated me to write a book to bridge the gap.
In this blog post, I will introduce the book’s content which will take you from understanding basic concepts through to implement the complex models.
All images and video in this blog are generated using codes in the book, and all codes were written from scratch. There are 18 model implementations that run on Jupyter notebook or Google Colab.
What Will You Learn?
Below are the book chapters:
- Getting Started with Image Generation with TensorFlow. We will learn the basic concept of probability and how it is used to create probabilistic generative model. We will learn how to use TensorFlow 2 to build custom layer for PixelCNN to generate the first handwritten digit (MNIST) images.
- Variational Autoencoder. Autoencoder is a versatile model that can be used in many image processing tasks such as denoising, super resolution, and to create deepfake. In this chapter, we will learn how to use it to generate and edit faces in Jupyter notebook.
3. Generative Adversarial Network (GAN). This includes everything you’ll need to know about GANs. We will use GANs to generate fashion MNIST. Most introductory books and tutorials stop at conditional GAN or DCGAN but it is only the beginning in this book. We will go on to implement WGAN and WGAN-GP to stabilize the training of GANs.
4. Image-to-Image Translation. Many computer vision tasks can be framed as image-to-image translation e.g. sketch-to-image, day-to-night, segmentation map-to-photo etc. There are quite a number of models to implement here, namely pix2pix, CycleGAN and BiCycleGAN. We will go through them one-by-one, it is not as scary as it sounds.
5. Style Transfer. We use neural style transfer to convert photo into artistic painting. Although neural style transfer also uses deep neural network but it is technically different from training a GAN or autoencoder model. However, it does has profound impact to the subsequent development of style-based GANs.
6. AI Painter. We’ll first go over how to use iGAN (Interactive GAN) to manipulate image, then we will implement Nvidia’s GauGAN to realistic looking photos from simple segmentation maps.
7. High Fidelity Face Generation. We will implement two famous models in this chapter, namely Progressive GAN (ProGAN) and StyleGAN to generate high definition portrait images. Most of the face generation AI you see online come from this family of model that grow the network progressively from low resolution of 4x4, 8x8, …, to 1024x1024. Compared to ProGAN, StyleGAN has built-in features to allow for style mixing as shown in the video on top of the page. A pre-trained model of 256x256 resolution is provided for you to play with.
8. Self-Attention for Image Generation. Attention module has been used to create state-of-the-art Natural Language Processing (NLP) models. We will now implement self-attention in two models — Self-Attention GAN (SAGAN) and BigGAN to generate images based using class labels. This chapter includes advanced embedding and conditioning batch normalization.
9. Video Synthesis. DeepFake is now very sophisticated, and is quite involving to put into a single chapter. Therefore, we will implement the very first deepfake algorithm using autoencoder to perform face swapping in video. You’ll learn not only the deep learning part but also the face image processing including face detection, alignment and warping to create a deepfake video from scratch.
10. Look Back and Ahead. Many techniques have been implemented for the models throughout this book. This chapter starts by summarizing the generative techniques we learned including: AdaIN (adaptive instance normalization), SPADE (spatially adaptive normalization), class conditional normalization, spectral normalization, orthogonal regularization, KL divergence loss, feature matching loss, Wasserstein loss, LS loss, hinge loss etc. We will then look at interesting and up-and-coming models and applications including text-to-image synthesis, image inpainting, video retargeting and neural rendering.
Who is this book for?
This book covers most if not all important techniques to create state-of-the-art image generation models. It is suitable for both engineers and early-stage researchers who want to understand how GANs work and to build them from scratch. To make the most of out this book, you should already have basic knowledge of creating convolutional neural network for image classification before starting this book. This books use TensorFlow 2’s Keras APIs, you don’t need to have prior knowledge of that if you have used other modern deep learning framework such as PyTorch. You’ll be able to pick up TF2 as you go along the book.
After finished reading this book, you’ll be well-versed in using AI for image generation and have solid foundation to explore emerging generative AI.
It is now available to buy on Amazon. Click the image to go https://www.amazon.com/dp/1838826785