Pytorch CNN无法学习,需要帮助找出原因

时间:2019-11-01 23:19:05

标签: python conv-neural-network pytorch

我正在和Pytorch一起做CNN,但是它不会学习并不能提高准确性。我使用MNIST数据集制作了一个版本,因此可以在此处发布。我只是在寻找为什么它不起作用的答案。该体系结构很好,我在Keras中实现了它,经过3个星期,我的准确率超过了92%。注意:我将MNIST重塑为60x60的图片,因为这是我的“真实”问题中的图片。

import numpy as np
from PIL import Image
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()


def resize(pics):
    pictures = []
    for image in pics:
        image = Image.fromarray(image).resize((dim, dim))
        image = np.array(image)
        pictures.append(image)
    return np.array(pictures)


dim = 60

x_train, x_test = resize(x_train), resize(x_test) # because my real problem is in 60x60

x_train = x_train.reshape(-1, 1, dim, dim).astype('float32') / 255
x_test = x_test.reshape(-1, 1, dim, dim).astype('float32') / 255
y_train, y_test = y_train.astype('float32'), y_test.astype('float32') 

if torch.cuda.is_available():
    x_train = torch.from_numpy(x_train)[:10_000]
    x_test = torch.from_numpy(x_test)[:4_000] 
    y_train = torch.from_numpy(y_train)[:10_000] 
    y_test = torch.from_numpy(y_test)[:4_000]


class ConvNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5*5*128, 1024) 
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 1)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1) 
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        x = torch.sigmoid(self.fc3(x))
        return x


net = ConvNet()

optimizer = optim.Adam(net.parameters(), lr=0.03)

loss_function = nn.BCELoss()


class FaceTrain:

    def __init__(self):
        self.len = x_train.shape[0]
        self.x_train = x_train
        self.y_train = y_train

    def __getitem__(self, index):
        return x_train[index], y_train[index].unsqueeze(0)

    def __len__(self):
        return self.len


class FaceTest:

    def __init__(self):
        self.len = x_test.shape[0]
        self.x_test = x_test
        self.y_test = y_test

    def __getitem__(self, index):
        return x_test[index], y_test[index].unsqueeze(0)

    def __len__(self):
        return self.len


train = FaceTrain()
test = FaceTest()

train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test, batch_size=64, shuffle=True)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in train_loader: 
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()        
        running_loss += loss.item()        
    else:
        test_loss = 0
        accuracy = 0        

        with torch.no_grad():
            for images, labels in test_loader: 
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)                
                ps = torch.exp(log_ps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class.type('torch.LongTensor') == labels.type(torch.LongTensor).view(*top_class.shape)
                accuracy += torch.mean(equals.type('torch.FloatTensor'))
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

2 个答案:

答案 0 :(得分:7)

首先是主要问题...

1。。此代码的主要问题是您使用了错误的输出形状和错误的损失函数进行分类。

nn.BCELoss计算二进制交叉熵损失。当您有一个或多个目标为0或1(因此为二进制)时,此方法适用。在您的情况下,目标是0到9之间的单个整数。由于潜在目标值的数量很少,所以最常见的方法是使用分类交叉熵损失(nn.CrossEntropyLoss)。交叉熵损失的“理论”定义期望网络输出和目标都是10维向量,其中目标在一个位置(除热编码外)全部为零。 但是出于计算稳定性和空间效率的原因, pytorch的{​​{1}}直接将整数作为目标但是,您仍然需要从网络中为其提供10维输出矢量。

nn.CrossEntropyLoss

要在您的代码中解决此问题,我们需要# pseudo code (ignoring batch dimension) loss = nn.functional.cross_entropy_loss(<output 10d vector>, <integer target>) 输出10维特征,并且我们需要标签为整数(而不是浮点数)。另外,由于pytorch的交叉熵损失函数在计算最终损失值之前会在内部应用log-softmax,因此无需在fc3上使用fc3

2。。正如Serget Dymchenko所指出的,您需要在推理期间将网络切换到.sigmoid模式,并在训练过程中将网络切换到eval模式。这主要影响dropout层和batch_norm层,因为它们在训练和推理期间的行为不同。

3。。0.03的学习率可能有点过高。在0.001的学习率下效果很好,在几个实验中,我发现训练的差异为0.03。


为适应这些修补程序,需要进行许多更改。下面显示了对代码的最小更正。我评论了用train进行了更改的所有行,然后简要说明了更改。

####

现在训练的结果...

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()


def resize(pics):
    pictures = []
    for image in pics:
        image = Image.fromarray(image).resize((dim, dim))
        image = np.array(image)
        pictures.append(image)
    return np.array(pictures)


dim = 60

x_train, x_test = resize(x_train), resize(x_test) # because my real problem is in 60x60

x_train = x_train.reshape(-1, 1, dim, dim).astype('float32') / 255
x_test = x_test.reshape(-1, 1, dim, dim).astype('float32') / 255
#### float32 -> int64
y_train, y_test = y_train.astype('int64'), y_test.astype('int64')

#### no reason to test for cuda before converting to numpy

#### I assume you were taking a subset for debugging? No reason to not use all the data
x_train = torch.from_numpy(x_train)
x_test = torch.from_numpy(x_test)
y_train = torch.from_numpy(y_train)
y_test = torch.from_numpy(y_test)


class ConvNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5*5*128, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        #### 1 -> 10
        self.fc3 = nn.Linear(2048, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        #### removed sigmoid
        x = self.fc3(x)
        return x


net = ConvNet()

#### 0.03 -> 1e-3
optimizer = optim.Adam(net.parameters(), lr=1e-3)

#### BCELoss -> CrossEntropyLoss
loss_function = nn.CrossEntropyLoss()


class FaceTrain:

    def __init__(self):
        self.len = x_train.shape[0]
        self.x_train = x_train
        self.y_train = y_train

    def __getitem__(self, index):
        #### .unsqueeze(0) removed
        return x_train[index], y_train[index]

    def __len__(self):
        return self.len


class FaceTest:

    def __init__(self):
        self.len = x_test.shape[0]
        self.x_test = x_test
        self.y_test = y_test

    def __getitem__(self, index):
        #### .unsqueeze(0) removed
        return x_test[index], y_test[index]

    def __len__(self):
        return self.len


train = FaceTrain()
test = FaceTest()

train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test, batch_size=64, shuffle=True)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    #### put net in train mode
    net.train()
    for idx, (images, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0

        #### put net in eval mode
        net.eval()
        with torch.no_grad():
            for images, labels in test_loader:
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)
                #### removed torch.exp() since exponential is monotone, taking it doesn't change the order of outputs. Similarly with torch.softmax()
                top_p, top_class = log_ps.topk(1, dim=1)
                #### convert to float/long using proper methods. what you have won't work for cuda tensors.
                equals = top_class.long() == labels.long().view(*top_class.shape)
                accuracy += torch.mean(equals.float())
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

其他一些可以改善性能和代码的问题。

4。。您永远不会将模型移至GPU。这意味着您将不会获得GPU加速。

5。[Epoch: 1/10] [Training Loss: 0.139] [Test Loss: 0.046] [Test Accuracy: 0.986] [Epoch: 2/10] [Training Loss: 0.046] [Test Loss: 0.042] [Test Accuracy: 0.987] [Epoch: 3/10] [Training Loss: 0.031] [Test Loss: 0.040] [Test Accuracy: 0.988] [Epoch: 4/10] [Training Loss: 0.022] [Test Loss: 0.029] [Test Accuracy: 0.990] [Epoch: 5/10] [Training Loss: 0.017] [Test Loss: 0.066] [Test Accuracy: 0.987] [Epoch: 6/10] [Training Loss: 0.015] [Test Loss: 0.056] [Test Accuracy: 0.985] [Epoch: 7/10] [Training Loss: 0.018] [Test Loss: 0.039] [Test Accuracy: 0.991] [Epoch: 8/10] [Training Loss: 0.012] [Test Loss: 0.057] [Test Accuracy: 0.988] [Epoch: 9/10] [Training Loss: 0.012] [Test Loss: 0.041] [Test Accuracy: 0.991] [Epoch: 10/10] [Training Loss: 0.007] [Test Loss: 0.048] [Test Accuracy: 0.992] 设计用于所有标准变换和数据集,并且可以与PyTorch一起使用。我建议使用它。这也消除了代码中对keras的依赖。

6。。通过减去平均值并除以标准偏差来标准化数据,以改善网络性能。使用Torchvision,您可以使用torchvision。这在MNIST中不会有太大的不同,因为它已经太容易了。但事实证明,在更困难的问题中,这一点很重要。


下面显示了进一步改进的代码(在GPU上快得多)。

transforms.Normalize

更新的培训结果...

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms

dim = 60

class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5 * 5 * 128, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        x = self.fc3(x)
        return x


net = ConvNet()
if torch.cuda.is_available():
    net.cuda()

optimizer = optim.Adam(net.parameters(), lr=1e-3)

loss_function = nn.CrossEntropyLoss()

train_dataset = MNIST('./data', train=True, download=True,
                      transform=transforms.Compose([
                          transforms.Resize((dim, dim)),
                          transforms.ToTensor(),
                          transforms.Normalize((0.1307,), (0.3081,))
                      ]))
test_dataset = MNIST('./data', train=False, download=True,
                     transform=transforms.Compose([
                         transforms.Resize((dim, dim)),
                         transforms.ToTensor(),
                         transforms.Normalize((0.1307,), (0.3081,))
                     ]))

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True, num_workers=8)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False, num_workers=8)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    net.train()
    for images, labels in train_loader:
        if torch.cuda.is_available():
            images, labels = images.cuda(), labels.cuda()
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0

        net.eval()
        with torch.no_grad():
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images, labels = images.cuda(), labels.cuda()
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)
                top_p, top_class = log_ps.topk(1, dim=1)
                equals = top_class.flatten().long() == labels
                accuracy += torch.mean(equals.float()).item()
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

答案 1 :(得分:4)

有一件事,我注意到您在训练模式下测试了模型。您需要致电df_sii['Rut Emisor'] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2656 try: -> 2657 return self._engine.get_loc(key) 2658 except KeyError: KeyError: 'Rut Emisor' During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) <ipython-input-102-bbef2f6b58b7> in <module> ----> 1 df_sii['Rut Emisor'] ~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2925 if self.columns.nlevels > 1: 2926 return self._getitem_multilevel(key) -> 2927 indexer = self.columns.get_loc(key) 2928 if is_integer(indexer): 2929 indexer = [indexer] KeyError: 'Rut Emisor' 来禁用辍学(然后再次致电 Linea;Rut Emisor;Razon Social;Tipo Dte;Folio Dte;Fecha Emision(DD-MM-AAAA);Monto Total;Fecha Hora Recepcion(DD-MM-AAAA HH:MM);TrackId 0 1;76961029-4;XANDOCORP TECNOLOGIA SPA;Factura Electronica;212;2019-10-27;99502;2019-10-27 23:26;4096898400 1 2;76961029-4;XANDOCORP TECNOLOGIA SPA;Factura Electronica;211;2019-10-27;9549821;2019-10-27 23:23;4096897508 2 3;76961029-4;XANDOCORP TECNOLOGIA SPA;Factura Electronica;210;2019-10-27;1739304;2019-10-27 23:19;4096896436 3 4;76961029-4;XANDOCORP TECNOLOGIA SPA;Factura Electronica;209;2019-10-27;63801;2019-10-27 23:14;4096894748 4 5;76861791-0;SOCIEDAD S Y M LOGISTICS SPA;Factura Electronica;109;2019-10-27;1212848;2019-10-27 23:10;4096893402 5 6;76868307-7;SERVIEXTERNOS SPA;Factura Electronica;78;2019-10-27;143659;2019-10-27 23:00;4096890316 6 7;96870370-6;ACCOR CHILE S. A.;Nota de Credito Electronica;16001;2019-10-27;128011;2019-10-27 14:47;4096609484 7 8;96870370-6;ACCOR CHILE S. A.;Factura Electronica;175748;2019-10-27;128011;2019-10-27 14:45;4096606476 8 9;76314405-4;PRODUCTORA GRÁFICA Y CIA. LIMITADA;Factura Electronica;805;2019-10-27;2153900;2019-10-27 13:39;4096579244 9 10;96870370-6;ACCOR CHILE S. A.;Factura Electronica;175746;2019-10-27;110810;2019-10-27 12:23;4096538192 10 11;96870370-6;ACCOR CHILE S. A.;Factura Electronica;175743;2019-10-27;349727;2019-10-27 09:22;4096464302 11 12;96870370-6;ACCOR CHILE S. A.;Factura Electronica;175742;2019-10-27;171266;2019-10-27 08:29;4096450914 使其回到训练模式)。

也许还有其他问题。训练损失在减少吗?您是否尝试过一个例子?