LRSchedulers to scheduler learning rate

Learning Schedulers can be used to scheduler the Learning Rates of any Optimizer in PyTorch. All Learning rate schedulers need to inherit from _LRScheduler class from PyTorch.

from gale.optimizer import Adam
import matplotlib.pyplot as plt

Generate a few mock paramters to test the schedulers -

epoch: int = 10
batch_nb: int = 10
max_steps: int = epoch * batch_nb

# mock model
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]

# mock optimizer
optim = Adam(model, lr=1e-04)

class LRMultiplier[source]

LRMultiplier(optimizer:Optimizer, multiplier:ParamScheduler, max_iter:int, last_iter:int=-1) :: _LRScheduler

A LRScheduler which uses fvcore ParamScheduler to multiply the learning rate of each param in the optimizer. Every step, the learning rate of each parameter becomes its initial value multiplied by the output of the given ParamScheduler. The absolute learning rate value of each parameter can be different. This scheduler can be used as long as the relative scale among them do not change during training.

Source: https://github.com/facebookresearch/detectron2/blob/master/detectron2/solver/lr_scheduler.py

A LRMultiplier can be used to convert any fvcore ParamScheduler to a LerningRate Scheduler -

from fvcore.common.param_scheduler import PolynomialDecayParamScheduler

param_scheduler = PolynomialDecayParamScheduler(1, 10)
scheduler = LRMultiplier(optim, param_scheduler, max_steps)

lrs = []

for _ in range(max_steps):
    scheduler.step()
    lrs.append(scheduler.get_lr())

plt.plot(lrs)
[<matplotlib.lines.Line2D at 0x7fbf106722b0>]

class WarmupParamScheduler[source]

WarmupParamScheduler(scheduler:ParamScheduler, warmup_factor:float, warmup_length:float, warmup_method:str='linear') :: CompositeParamScheduler

Add an initial warmup stage to another scheduler.

Source - https://github.com/facebookresearch/fvcore/blob/master/fvcore/common/param_scheduler.py

CosineLR[source]

CosineLR(optim:Optimizer, max_iters:int, start_value:int=1, end_value:int=0)

Cosine decay or cosine warmup schedules based on start and end values. These values are relative to the values of your LR's. This scheduler is meant to be called after each "training step" .

scheduler = CosineLR(optim, max_iters=max_steps)

lr = []

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        optim.zero_grad()
        scheduler.step()
        lr.append(scheduler.get_lr())

plt.plot(lr, label="Cosine Annealing Schedule")
plt.legend()
<matplotlib.legend.Legend at 0x7fbf106181d0>

FlatCosScheduler[source]

FlatCosScheduler(optimizer:Optimizer, pct_start:float, max_iters:int)

Schedule the LearningRate at flat lr for pct_start of max_iters before cosine annealing. This scheduler is meant to be called after a batch has been used for training.

Inspired From - https://docs.fast.ai/callback.schedule.html#Learner.fit_flat_cos.

lr = []

scheduler = FlatCosScheduler(optim, pct_start=0.72, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        optim.zero_grad()
        scheduler.step()
        lr.append(scheduler.get_lr())

plt.plot(lr, label="Flat Cosine Schedule")
plt.legend()
<matplotlib.legend.Legend at 0x7fbf107ff710>

WarmupCosineLR[source]

WarmupCosineLR(optimizer:Optimizer, max_iters:int, pct_start:Optional[float]=None, warmup_steps:Optional[int]=None, warmup_factor:float=0.001)

Linearly increase lr for pct_start or warmup_steps before cosine annealing from pct_start or warmup_steps.

Arguments to WarmupCosineLR -

  • optimizer (Optimizer): Wrapped Optimizer.
  • max_iters (int): The total number of steps to train for.
  • pct_start (float): The percentage of steps spent increasing the learning rate. Default: None
  • warmup_steps (int): The no. of steps spent increasing the learning rate. Default: None
  • warmup_factor (float): The factor w.r.t the initial value of LR in scheduler
lr1 = []

scheduler = WarmupCosineLR(optim, pct_start=0.1, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        scheduler.step()
        lr1.append(scheduler.get_lr())

plt.plot(lr1, label="CosineLR + Warmup")
plt.legend()
<matplotlib.legend.Legend at 0x7fbf10970630>

WarmupLinearLR[source]

WarmupLinearLR(optimizer:Optimizer, max_iters:int, pct_start:Optional[float]=None, warmup_steps:Optional[int]=None, warmup_factor:float=0.001)

Linearly increase lr for pct_start or warmup_steps before linear decreasing of lr.

Arguments to WarmupLinearLR -

  • optimizer (Optimizer): Wrapped Optimizer.
  • max_iters (int): The total number of steps to train for.
  • pct_start (float): The percentage of steps spent increasing the learning rate. Default: None
  • warmup_steps (int): The no. of steps spent increasing the learning rate. Default: None
  • warmup_factor (float): The factor w.r.t the initial value of LR in scheduler
lr2 = []

scheduler = WarmupLinearLR(optim, pct_start=0.1, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        scheduler.step()
        lr2.append(scheduler.get_lr())

plt.plot(lr2, label="LinearLR + Warmup")
plt.legend()
<matplotlib.legend.Legend at 0x7fbf10a1b978>

WarmupConstantLR[source]

WarmupConstantLR(optimizer:Optimizer, max_iters:int, pct_start:Optional[float]=None, warmup_steps:Optional[int]=None, warmup_factor:float=0.001)

Linearly increase lr for pct_start or warmup_steps after which keep the lr at constant value.

Arguments to WarmupConstantLR -

  • optimizer (Optimizer): Wrapped Optimizer.
  • max_iters (int): The total number of steps to train for.
  • pct_start (float): The percentage of steps spent increasing the learning rate. Default: None
  • warmup_steps (int): The no. of steps spent increasing the learning rate. Default: None
  • warmup_factor (float): The factor w.r.t the initial value of LR in scheduler
lr3 = []

scheduler = WarmupConstantLR(optim, pct_start=0.1, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        scheduler.step()
        lr3.append(scheduler.get_lr())

plt.plot(lr3, label="ConstantLR + Warmup")
plt.legend()
<matplotlib.legend.Legend at 0x7fbf10aa3080>

Comparison of all the Warmup LR Schedulers -

plt.plot(lr1, label="CosineLR + Warmup")
plt.plot(lr2, label="LinearLR + Warmup")
plt.plot(lr3, label="ConstantLR + Warmup")
plt.legend()
plt.title("Warmup LR Schedulers")
Text(0.5, 1.0, 'Warmup LR Schedulers')

WarmupStepLR[source]

WarmupStepLR(optimizer:Optimizer, epochs:int, num_decays:int, warmup_epochs:int=0, decay_rate:float=1.0, warmup_factor:float=1e-05)

Decays the learning rate of each parameter group by decay_rate at equal number of epochs given by num_decays. You can optionally add a warmup scheduler using warmup_epochs. This scheduler is meant to be called after each epoch.

lr1, lr2 = [], []
epoch = 100

# fmt: off
scheduler = WarmupStepLR(optim, epochs=epoch, num_decays=5, warmup_epochs=0, decay_rate=0.8)
wrm_scheduler = WarmupStepLR(optim, epochs=epoch, num_decays=5, warmup_epochs=3, decay_rate=0.8)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
    
    # scheduler is called after each epoch
    scheduler.step()
    wrm_scheduler.step()
    lr1.append(scheduler.get_lr())
    lr2.append(wrm_scheduler.get_lr())

plt.plot(lr2, label="With Warmup")
plt.plot(lr1, alpha=0.8, label="Without Warmup")
plt.legend();