我正在尝试在pytorch中训练模型。
输入:686个数组 第一层:64阵列 第二层:2数组 输出:前提是1或0
这是我到目前为止所拥有的:
class autoencoder(nn.Module):
def __init__(self):
super(autoencoder, self).__init__()
self.encoder_softmax = nn.Sequential(
nn.Linear(686, 256),
nn.ReLU(True),
nn.Linear(256, 2),
nn.Softmax()
)
def forward(self, x):
x = self.encoder_softmax(x)
return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
net = net.to(device)
iterations = 10
learning_rate = 0.98
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(
net.parameters(), lr=learning_rate, weight_decay=1e-5)
for epoch in range(iterations):
loss = 0.0
print("train_dl len: ", len(train_dl))
# net.train()
for i, data in enumerate(train_dl, 0):
inputs, labels, vectorize = data
labels = labels.long().to(device)
inputs = inputs.float().to(device)
optimizer.zero_grad()
outputs = net(inputs)
train_loss = criterion(outputs, labels)
train_loss.backward()
optimizer.step()
loss += train_loss.item()
loss = loss / len(train_dl)
但是当我训练模型时,损失并没有减少。我在做什么错了?
答案 0 :(得分:2)
您正在使用nn.CrossEntropyLoss
作为损失函数,该函数应用log-softmax,但同时也在模型中应用softmax:
self.encoder_softmax = nn.Sequential(
nn.Linear(686, 256),
nn.ReLU(True),
nn.Linear(256, 2),
nn.Softmax() # <- needs to be removed
)
模型的输出应为原始logit,而不包含nn.Softmax
。
您还应该降低学习率,因为0.98的学习率非常高,这会使训练变得不稳定得多,并且您很可能会看到损失振荡。更合适的学习率应在0.01或0.001的范围内。