LRSchedulers to scheduler learning rate
Learning Schedulers can be used to scheduler the Learning Rates of any Optimizer in PyTorch. All Learning rate schedulers need to inherit from _LRScheduler
class from PyTorch.
from gale.optimizer import Adam
import matplotlib.pyplot as plt
Generate a few mock paramters to test the schedulers -
epoch: int = 10
batch_nb: int = 10
max_steps: int = epoch * batch_nb
# mock model
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
# mock optimizer
optim = Adam(model, lr=1e-04)
A LRMultiplier
can be used to convert any fvcore ParamScheduler
to a LerningRate Scheduler -
from fvcore.common.param_scheduler import PolynomialDecayParamScheduler
param_scheduler = PolynomialDecayParamScheduler(1, 10)
scheduler = LRMultiplier(optim, param_scheduler, max_steps)
lrs = []
for _ in range(max_steps):
scheduler.step()
lrs.append(scheduler.get_lr())
plt.plot(lrs)
scheduler = CosineLR(optim, max_iters=max_steps)
lr = []
for _ in range(epoch):
for _ in range(batch_nb):
optim.step()
optim.zero_grad()
scheduler.step()
lr.append(scheduler.get_lr())
plt.plot(lr, label="Cosine Annealing Schedule")
plt.legend()
lr = []
scheduler = FlatCosScheduler(optim, pct_start=0.72, max_iters=max_steps)
for _ in range(epoch):
for _ in range(batch_nb):
optim.step()
optim.zero_grad()
scheduler.step()
lr.append(scheduler.get_lr())
plt.plot(lr, label="Flat Cosine Schedule")
plt.legend()
Arguments to WarmupCosineLR
-
optimizer
(Optimizer): Wrapped Optimizer.max_iters
(int): The total number of steps to train for.pct_start
(float): The percentage of steps spent increasing the learning rate. Default: Nonewarmup_steps
(int): The no. of steps spent increasing the learning rate. Default: Nonewarmup_factor
(float): The factor w.r.t the initial value of LR inscheduler
Note: You must either providepct_start
orwarmup_steps
&.step
should be called after a batch has been used for training.
lr1 = []
scheduler = WarmupCosineLR(optim, pct_start=0.1, max_iters=max_steps)
for _ in range(epoch):
for _ in range(batch_nb):
optim.step()
scheduler.step()
lr1.append(scheduler.get_lr())
plt.plot(lr1, label="CosineLR + Warmup")
plt.legend()
Arguments to WarmupLinearLR
-
optimizer
(Optimizer): Wrapped Optimizer.max_iters
(int): The total number of steps to train for.pct_start
(float): The percentage of steps spent increasing the learning rate. Default: Nonewarmup_steps
(int): The no. of steps spent increasing the learning rate. Default: Nonewarmup_factor
(float): The factor w.r.t the initial value of LR inscheduler
Note: You must either providepct_start
orwarmup_steps
&.step
should be called after a batch has been used for training.
lr2 = []
scheduler = WarmupLinearLR(optim, pct_start=0.1, max_iters=max_steps)
for _ in range(epoch):
for _ in range(batch_nb):
optim.step()
scheduler.step()
lr2.append(scheduler.get_lr())
plt.plot(lr2, label="LinearLR + Warmup")
plt.legend()
Arguments to WarmupConstantLR
-
optimizer
(Optimizer): Wrapped Optimizer.max_iters
(int): The total number of steps to train for.pct_start
(float): The percentage of steps spent increasing the learning rate. Default: Nonewarmup_steps
(int): The no. of steps spent increasing the learning rate. Default: Nonewarmup_factor
(float): The factor w.r.t the initial value of LR inscheduler
Note: You must either providepct_start
orwarmup_steps
&.step
should be called after a batch has been used for training.
lr3 = []
scheduler = WarmupConstantLR(optim, pct_start=0.1, max_iters=max_steps)
for _ in range(epoch):
for _ in range(batch_nb):
optim.step()
scheduler.step()
lr3.append(scheduler.get_lr())
plt.plot(lr3, label="ConstantLR + Warmup")
plt.legend()
Comparison of all the Warmup LR Schedulers -
plt.plot(lr1, label="CosineLR + Warmup")
plt.plot(lr2, label="LinearLR + Warmup")
plt.plot(lr3, label="ConstantLR + Warmup")
plt.legend()
plt.title("Warmup LR Schedulers")
lr1, lr2 = [], []
epoch = 100
# fmt: off
scheduler = WarmupStepLR(optim, epochs=epoch, num_decays=5, warmup_epochs=0, decay_rate=0.8)
wrm_scheduler = WarmupStepLR(optim, epochs=epoch, num_decays=5, warmup_epochs=3, decay_rate=0.8)
for _ in range(epoch):
for _ in range(batch_nb):
optim.step()
# scheduler is called after each epoch
scheduler.step()
wrm_scheduler.step()
lr1.append(scheduler.get_lr())
lr2.append(wrm_scheduler.get_lr())
plt.plot(lr2, label="With Warmup")
plt.plot(lr1, alpha=0.8, label="Without Warmup")
plt.legend();