使用PyTorch在Adam模型中获得nan损失

时间:2019-12-11 16:28:29

标签: deep-learning pytorch adam

我是神经网络训练的新手。如果这是一个非常愚蠢的问题,或者违反了堆栈溢出中未提及的任何规则,请原谅我。我最近开始处理泰坦尼克号数据集。我清理了数据。我有一个特征张量,是通过将归一化的连续数据和分类数据的一个热张量进行级联而制成的。我将这些数据传递到一个简单的线性模型中,并且在所有时期都遭受了损失。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from tqdm import tqdm
import pickle
import pathlib

path = pathlib.Path('./drive/My Drive/Kaggle/Titanic')

with open(path/'feature_tensor.pickle', 'rb') as f:
    features = pickle.load(f)

with open(path/'label_tensor.pickle', 'rb') as f:
    labels = pickle.load(f)

features = features.float()
labels = labels.float()

import math
valid_size = -1 * math.floor(0.2*len(features))

train_features = features[:valid_size]
valid_features = features[valid_size:]

train_labels = labels[:valid_size]
valid_labels = labels[valid_size:]

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.h_l1 = nn.Linear(18, 64)
        self.h_l2 = nn.Linear(64, 32)
        self.o_l = nn.Linear(32, 2)

    def forward(self, x):
        x = F.relu(self.h_l1(x))
        x = F.relu(self.h_l2(x))
        return self.o_l(x)

model = Model()
model.to('cuda')

optimizer = optim.Adam(model.parameters())
loss_fn = nn.MSELoss()

EPOCHS = 5
BATCH_SIZE = 20

for EPOCH in range(0, EPOCHS):
    for i in tqdm(range(0, len(features), BATCH_SIZE)):
        train_feature_batch = train_features[i:i+BATCH_SIZE,:].to('cuda')
        train_label_batch = train_labels[i:i+BATCH_SIZE,:].to('cuda')
        valid_feature_batch = valid_features[i:i+BATCH_SIZE,:].to('cuda')
        valid_label_batch = valid_labels[i:i+BATCH_SIZE,:].to('cuda')
        train_loss = loss_fn(model(train_feature_batch), train_label_batch)
        with torch.no_grad():
            valid_loss = loss_fn(model(valid_feature_batch), valid_label_batch)
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
    print(f"Epoch : {EPOCH}\tTrain Loss : {train_loss}\tValid_loss : {valid_loss}\n")

我得到以下输出:

100%|██████████| 45/45 [00:00<00:00, 511.50it/s]
100%|██████████| 45/45 [00:00<00:00, 604.10it/s]
100%|██████████| 45/45 [00:00<00:00, 586.21it/s]
  0%|          | 0/45 [00:00<?, ?it/s]Epoch : 0 Train Loss : nan    Valid_loss : nan

Epoch : 1   Train Loss : nan    Valid_loss : nan

Epoch : 2   Train Loss : nan    Valid_loss : nan

100%|██████████| 45/45 [00:00<00:00, 555.55it/s]
100%|██████████| 45/45 [00:00<00:00, 607.65it/s]Epoch : 3   Train Loss : nan    Valid_loss : nan

Epoch : 4   Train Loss : nan    Valid_loss : nan

是的,输出像这样分散。请帮忙。

0 个答案:

没有答案