In this tutorial we will explore the various ways one can perform image augmentation using TensorFlow. We will cover these following ways:

At each steps we will also explore the pros and cons of the all the above mentioned methods .

Note: This tutorial doesn’t cover model training or data preprocessing we will test our augmentations functions on the tf_flowers dataset available tensorflow_datasets.

Introduction

Data augmentation artificially increases the size of the training set by generating many realistic variants of each training instance. Data augmentation is a technique used for introducing variety in training data thereby helping to mitigate overfitting.

With TensorFlow, we get a number of different ways we can apply data augmentation to image datasets. Here we will exploring the 3 different ways mentioned above.

Let’s get started!

Experimental setup

import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow import keras
import tensorflow_datasets as tfds

from sklearn.datasets import load_sample_image
import matplotlib.pyplot as plt
import albumentations as A
from PIL import Image
import numpy as np
import math

tfa.register_all()
tf.random.set_seed(42)
np.random.seed(42)

autotune = tf.data.experimental.AUTOTUNE
print("Tensorflow Version          : ", tf.__version__)
print("Tensorflow Addons Version   : ", tfa.__version__)
print("Tensorflow Datasets Version : ", tfa.__version__)
print("Albumentations Version      : ", A.__version__)

Tensorflow Version          :  2.3.0
Tensorflow Addons Version   :  0.11.2
Tensorflow Datasets Version :  0.11.2
Albumentations Version      :  0.4.5

We are going to use the tf_flowers dataset to demonstrate the experiments. Loading the Dataset into a tf.data format is done via just a single API call using tensorflow_datasets as given below :

#load in a sample dataset to perform image augmentation
ds_height = 120
ds_width  = 120
ds_batch  = 32

dataset, info = tfds.load("tf_flowers", as_supervised=True, with_info=True, split="train",)
class_names = info.features["label"].names
print(class_names)

#Functions to display the images used in this experiment
def show_image_batch(images: list):
    """
    Displays a batch of image present in images
    """
    fig = plt.figure(figsize=(10,5))
    for idx in range(6):
        ax = plt.subplot(2, 3, idx+1)
        plt.imshow(images[idx])
        plt.axis("off")

def show_dataset(dataset):
    batch = next(iter(dataset))
    images, labels = batch
    
    plt.figure(figsize=(10, 10))
    for idx in range(9):
        ax = plt.subplot(3, 3, idx + 1)
        plt.imshow(images[idx].numpy().astype("uint8"))
        plt.title("Class: {}".format(class_names[labels[idx].numpy()]))
        plt.axis("off")

We will verify the images and the labels to ensure indeed parsed right . This will help in reducing error down the pipeline -

#resize the images of the dataset to ensure all are of same shape
#because tensorflow expects all the batches in a dimension to be of same 
#shape
temp_ds = dataset.map(lambda x, y: (tf.image.resize(x, size=[ds_height, ds_width]), y))
show_dataset(temp_ds.batch(9))

Let's grab a random image . We will use this image to test our augmentation functions. At each step we will repeatedly apply the augmentation function to the same image and view the results.

h = 210
w = 250

image = load_sample_image("flower.jpg")
image = Image.fromarray(image)
image = image.resize(size=(w,h))
image

Now that we have our random image and our tf_flowers dataset ready , let's procedd with creating our image augmentation pipelines -

Using tf.image and tfa.image