Learning Schedulers can be used to scheduler the Learning Rates of any Optimizer in PyTorch. All Learning rate schedulers need to inherit from _LRScheduler class from PyTorch.

from gale.optimizer import Adam
import matplotlib.pyplot as plt

Generate a few mock paramters to test the schedulers -

epoch: int = 10
batch_nb: int = 10
max_steps: int = epoch * batch_nb

# mock model
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]

# mock optimizer
optim = Adam(model, lr=1e-04)

A LRMultiplier can be used to convert any fvcore ParamScheduler to a LerningRate Scheduler -

from fvcore.common.param_scheduler import PolynomialDecayParamScheduler

param_scheduler = PolynomialDecayParamScheduler(1, 10)
scheduler = LRMultiplier(optim, param_scheduler, max_steps)

lrs = []

for _ in range(max_steps):
    scheduler.step()
    lrs.append(scheduler.get_lr())

plt.plot(lrs)

[<matplotlib.lines.Line2D at 0x7fbf106722b0>]

scheduler = CosineLR(optim, max_iters=max_steps)

lr = []

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        optim.zero_grad()
        scheduler.step()
        lr.append(scheduler.get_lr())

plt.plot(lr, label="Cosine Annealing Schedule")
plt.legend()

<matplotlib.legend.Legend at 0x7fbf106181d0>

lr = []

scheduler = FlatCosScheduler(optim, pct_start=0.72, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        optim.zero_grad()
        scheduler.step()
        lr.append(scheduler.get_lr())

plt.plot(lr, label="Flat Cosine Schedule")
plt.legend()

<matplotlib.legend.Legend at 0x7fbf107ff710>

Arguments to WarmupCosineLR -

optimizer (Optimizer): Wrapped Optimizer.
max_iters (int): The total number of steps to train for.
pct_start (float): The percentage of steps spent increasing the learning rate. Default: None
warmup_steps (int): The no. of steps spent increasing the learning rate. Default: None
warmup_factor (float): The factor w.r.t the initial value of LR in scheduler
Note: You must either provide pct_start or warmup_steps & .step should be called after a batch has been used for training.

lr1 = []

scheduler = WarmupCosineLR(optim, pct_start=0.1, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        scheduler.step()
        lr1.append(scheduler.get_lr())

plt.plot(lr1, label="CosineLR + Warmup")
plt.legend()

<matplotlib.legend.Legend at 0x7fbf10970630>

Arguments to WarmupLinearLR -

optimizer (Optimizer): Wrapped Optimizer.
max_iters (int): The total number of steps to train for.
pct_start (float): The percentage of steps spent increasing the learning rate. Default: None
warmup_steps (int): The no. of steps spent increasing the learning rate. Default: None
warmup_factor (float): The factor w.r.t the initial value of LR in scheduler
Note: You must either provide pct_start or warmup_steps & .step should be called after a batch has been used for training.

lr2 = []

scheduler = WarmupLinearLR(optim, pct_start=0.1, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        scheduler.step()
        lr2.append(scheduler.get_lr())

plt.plot(lr2, label="LinearLR + Warmup")
plt.legend()

<matplotlib.legend.Legend at 0x7fbf10a1b978>

Arguments to WarmupConstantLR -

optimizer (Optimizer): Wrapped Optimizer.
max_iters (int): The total number of steps to train for.
pct_start (float): The percentage of steps spent increasing the learning rate. Default: None
warmup_steps (int): The no. of steps spent increasing the learning rate. Default: None
warmup_factor (float): The factor w.r.t the initial value of LR in scheduler
Note: You must either provide pct_start or warmup_steps & .step should be called after a batch has been used for training.

lr3 = []

scheduler = WarmupConstantLR(optim, pct_start=0.1, max_iters=max_steps)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
        scheduler.step()
        lr3.append(scheduler.get_lr())

plt.plot(lr3, label="ConstantLR + Warmup")
plt.legend()

<matplotlib.legend.Legend at 0x7fbf10aa3080>

Comparison of all the Warmup LR Schedulers -

plt.plot(lr1, label="CosineLR + Warmup")
plt.plot(lr2, label="LinearLR + Warmup")
plt.plot(lr3, label="ConstantLR + Warmup")
plt.legend()
plt.title("Warmup LR Schedulers")

Text(0.5, 1.0, 'Warmup LR Schedulers')

lr1, lr2 = [], []
epoch = 100

# fmt: off
scheduler = WarmupStepLR(optim, epochs=epoch, num_decays=5, warmup_epochs=0, decay_rate=0.8)
wrm_scheduler = WarmupStepLR(optim, epochs=epoch, num_decays=5, warmup_epochs=3, decay_rate=0.8)

for _ in range(epoch):
    for _ in range(batch_nb):
        optim.step()
    
    # scheduler is called after each epoch
    scheduler.step()
    wrm_scheduler.step()
    lr1.append(scheduler.get_lr())
    lr2.append(wrm_scheduler.get_lr())

plt.plot(lr2, label="With Warmup")
plt.plot(lr1, alpha=0.8, label="Without Warmup")
plt.legend();

Learning Rate Schedules

`class` `LRMultiplier`[source]

`class` `WarmupParamScheduler`[source]

`CosineLR`[source]

`FlatCosScheduler`[source]

`WarmupCosineLR`[source]

`WarmupLinearLR`[source]

`WarmupConstantLR`[source]

`WarmupStepLR`[source]

Learning Rate Schedules

class LRMultiplier[source]

class WarmupParamScheduler[source]

CosineLR[source]

FlatCosScheduler[source]

WarmupCosineLR[source]

WarmupLinearLR[source]

WarmupConstantLR[source]

WarmupStepLR[source]