Transforms to apply data augmentation in Computer Vision for Image Classification Tasks
from fastcore.all import Path
from nbdev.export import Config
from torch.utils.data import DataLoader

from gale.classification.core import *
from gale.collections.download import download_and_extract_archive

URL = "https://download.pytorch.org/tutorial/hymenoptera_data.zip"
data_path = Path(Config().path("nbs_path")) / "data"

# download a toy dataset
download_and_extract_archive(url=URL, download_root=data_path, extract_root=data_path)

# take a peek at the structure of the dataset
path = data_path / "hymenoptera_data"

parser = FolderParser(path / "train")
Using downloaded and verified file: /Users/ayushman/Desktop/gale/nbs/data/hymenoptera_data.zip
Extracting /Users/ayushman/Desktop/gale/nbs/data/hymenoptera_data.zip to /Users/ayushman/Desktop/gale/nbs/data

imagenet_no_augment_transform[source]

imagenet_no_augment_transform(size:Union[Sequence, int]=224, interpolation:str='bilinear')

The default image transform without data augmentation.

It is often useful for testing models on Imagenet. It sequentially resizes the image and takes a central cropping.

Args:

  • size: Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).
  • interpolation: Desired interpolation
transforms = imagenet_no_augment_transform(size=(224, 224), interpolation="bicubic")
mapper = ClassificationMapper(transforms)
dset = ClassificationDataset(mapper, parser)
dls = DataLoader(dset, batch_size=8, shuffle=False)
samples = next(iter(dls))
show_image_batch(samples)

imagenet_augment_transform[source]

imagenet_augment_transform(size:int=224, scale:Optional[float]=None, ratio:Optional[float]=None, interpolation:str='random', hflip:Union[float, bool]=0.5, vflip:Union[float, bool]=False, color_jitter:Union[Sequence, float]=0.4, auto_augment:Optional[str]=None, mean:Optional[Sequence[float]]=(0.485, 0.456, 0.406))

The default image transform with data augmentation.It is often useful for training models on Imagenet.

Adapted from: https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/transforms_factory.py

Args:

  • size: Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).
  • scale: Range of size of the origin size cropped.
  • ratio: Range of aspect ratio of the origin aspect ratio cropped.
  • interpolation: Desired interpolation.
  • hflip: Probability for random horizontal flip. Active if not False and > 0.
  • vflip: Probability for random vertical flip. Active if not False and > 0.
  • color_jitter: Values of brightness, contrast, saturation and hue for torchvision.transforms.ColorJitter.
  • auto_augment: String defining configuration of auto augmentation policies. Currently supports only AutoAugment and RandAugment.
  • mean: Image mean, required only if auto_augment is not None.
transforms = imagenet_augment_transform(
    size=(224, 224),
    hflip=0.5,
    vflip=0.5,
    color_jitter=0.4,
)
mapper = ClassificationMapper(transforms)
dset = ClassificationDataset(mapper, parser)
dls = DataLoader(dset, batch_size=8, shuffle=False)
samples = next(iter(dls))
show_image_batch(samples)

Applying AutoAugment:

To use AutoAugment we need to create a config str and specify the parameters. Under the hood the above functions uses auto_augment_transform from timm. Config str must be like should consists of multiple sections separated by dashes ('-'). The first section defines the AutoAugment policy (one of 'v0', 'v0r', 'original', 'originalr'). The remaining sections, not order sepecific determine mstd(float std deviation of magnitude noise applied)

Ex - original-mstd0.5 results in AutoAugment with original policy, magnitude_std 0.5

transforms = imagenet_augment_transform(
    size=(224, 224), hflip=0.5, vflip=0.5, auto_augment="original-mstd0.5"
)
mapper = ClassificationMapper(transforms)
dset = ClassificationDataset(mapper, parser)
dls = DataLoader(dset, batch_size=8, shuffle=False)
samples = next(iter(dls))
show_image_batch(samples)

Applying RandAugment:

To use RandAugment we need to create a config str and specify the parameters. Under the hood the above functions uses rand_augment_transform from timm. Config str must be like should consists of multiple sections separated by dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand').

The remaining sections, not order sepecific determine:

  • m - integer magnitude of rand augment
  • n - integer num layers (number of transform ops selected per image)
  • w - integer probabiliy weight index (index of a set of weights to influence choice of op)
  • mstd - float std deviation of magnitude noise applied
  • inc - integer (bool), use augmentations that increase in severity with magnitude (default: 0)

Ex - rand-m9-n3-mstd0.5 results in RandAugment with magnitude 9, num_layers 3, magnitude_std 0.5 & rand-mstd1-w0 results in magnitude_std 1.0, weights 0, default magnitude of 10 and num_layers 2

transforms = imagenet_augment_transform(
    size=(224, 224), hflip=0.5, vflip=0.5, auto_augment="rand-m9-n3-mstd0.5"
)

mapper = ClassificationMapper(transforms)
dset = ClassificationDataset(mapper, parser)
dls = DataLoader(dset, batch_size=8, shuffle=False)
samples = next(iter(dls))
show_image_batch(samples)

aug_transforms[source]

aug_transforms(presize:int=260, size:int=224, interpolation:int=1, hflip:Union[float, bool]=0.5, vflip:Union[float, bool]=False, max_lighting:Union[float, bool]=0.2, p_lighting:float=0.75, max_rotate:Union[float, int]=10.0, p_rotate:float=0.5, max_warp:Union[float, bool]=0.2, p_affine:float=0.75, pad_mode:str='reflect', mult:float=1.0, xtra_tfms:Optional[List]=None)

Utility func to easily create a list of flip, rotate, zoom, lighting transforms. Inspired from : https://docs.fast.ai/vision.augment.html#aug_transforms

This function utilizes transformations from albumentations library. First the Image is resized to presize , transformations are applied after which we RandomCrop to size. HorizontalFlip (or VerticalFlip if vflip=True) with p=hlip (or vflip) is added when hflip=True. With p_rotate we apply a albumentations.Rotate of max_rotate degrees and if max_warp a albumentations.IAAAffine with scale=max_warp and model=pad_mode. With p_lighting we apply a change in brightness and contrast of max_lighting. Custonm xtra_tfms can be added, these must be a List contraining transformations from albumentations. max_rotate,max_lighting,max_warp are multiplied by mult so you can more easily increase or decrease augmentation with a single parameter.

transforms = aug_transforms(260, 224, mult=1.0)

mapper = ClassificationMapper(transforms)
dset = ClassificationDataset(mapper, parser)
dls = DataLoader(dset, batch_size=8, shuffle=False)

samples = next(iter(dls))
show_image_batch(samples)

from timm.data.random_erasing import RandomErasing

re = RandomErasing(1.0, mode="pixel", device="cpu")

# # or alternatively:
# from torchvision.transforms import RandomErasing
# re = RandomErasing(1.0)

transforms = aug_transforms(260, 224, mult=2.0)

mapper = ClassificationMapper(transforms, xtras=re)
dset = ClassificationDataset(mapper, parser)

dls = DataLoader(dset, batch_size=8, shuffle=False)

samples = next(iter(dls))
show_image_batch(samples)