正则化

$$
J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left[y_{i} \ln \hat{y}{i}+\left(1-y{i}\right) \ln \left(1-\hat{y}_{i}\right)\right]
$$

L2-regularization

1
2
3
4
5
6
device = torch.device('cuda:0')
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(),
lr=learning_rate,
weight_decay=0.01)
criteon = nn.CrossEntropyLoss().to(device)

L1-regularization

pytorch中L1范数没有实现,需要我们自己实现

1
2
3
4
5
6
7
8
9
regularization_loss = 0
weight_decay = 0.01
for param in model.parameters()
regularization_loss += torch.sum(torch.abs(param))
classify_loss = criteen(logits,target)
loss = classify_loss + weight_decay * regularization_loss
optimizer.zero_gard()
loss.backward()
optimizer.step()

动量因子

$$
\begin{aligned} w^{k+1} &=w^{k}-\alpha \nabla f\left(w^{k}\right) \ z^{k+1} &=\beta z^{k}+\nabla f\left(w^{k}\right) \ w^{k+1} &=w^{k}-\alpha z^{k+1} \end{aligned}
$$