从训练数据集中PyTorch读取数据时出现运行时错误

时间:2020-02-28 12:58:33

标签: python runtime-error pytorch conv-neural-network training-data

我在训练数据集中有一个数据样本,如果我打印数据,我可以查看它,但是在访问它以训练数据时,我不断得到RuntimeError: Expected object of scalar type Double but got scalar type Float for argument #2 'weight' in call to _thnn_conv2d_forward。我无法弄清楚为什么会这样。我还在末尾附加了一张图片,以更好地理解错误消息。

labels.txt文件如下所示(链接到具有相应图像,中心点(x,y)和半径的另一个文件夹的图像名称)

0000,   0.67 ,   0.69 ,   0.26 
0001,   0.69 ,   0.33 ,   0.3  
0002,   0.16 ,   0.27 ,   0.15 
0003,   0.54 ,   0.33 ,   0.17 
0004,   0.32 ,   0.45 ,   0.3  
0005,   0.78 ,   0.26 ,   0.17 
0006,   0.44 ,   0.49 ,   0.19 

编辑:这是损失函数,我正在使用优化器=

optim.Adam(model.parameters(), lr=0.001)
nn.CrossEntropyLoss()

我的验证模型功能如下:

def validate_model(model, loader):
    model.eval() # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
                  # (dropout is set to zero)

    val_running_loss = 0.0
    val_running_correct = 0

    for int, data in enumerate(loader):
        data, target = data['image'].to(device), data['labels'].to(device)
        output = model(data)
        loss = my_loss(output, target)

        val_running_loss = val_running_loss + loss.item()
        _, preds = torch.max(output.data, 1)

        val_running_correct = val_running_correct+ (preds == target).sum().item()

    avg_loss = val_running_loss/len(loader.dataset)
    val_accuracy = 100. * val_running_correct/len(loader.dataset)

    #----------------------------------------------
    # implementation needed here 
    #----------------------------------------------
    return avg_loss, val_accuracy

我有一个适合的功能来解决训练损失:

def fit(model, train_dataloader):
    model.train()
    train_running_loss = 0.0
    train_running_correct = 0
    for i, data in enumerate(train_dataloader):
        print(data)
        #I believe this is causing the error, but not sure why.
        data, target = data['image'].to(device), data['labels'].to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = my_loss(output, target)
        train_running_loss = train_running_loss + loss.item()
        _, preds = torch.max(output.data, 1)
        train_running_correct =  train_running_correct + (preds == target).sum().item()
        loss.backward()
        optimizer.step()
    train_loss = train_running_loss/len(train_dataloader.dataset)
    train_accuracy = 100. * train_running_correct/len(train_dataloader.dataset)

    print(f'Train Loss: {train_loss:.4f}, Train Acc: {train_accuracy:.2f}')

    return train_loss, train_accuracy

和下面的train_model函数,它将损失和准确性存储在一个列表中:

train_losses , train_accuracy = [], []
validation_losses , val_accuracy = [], []

def train_model(model,
                optimizer,
                train_loader,
                validation_loader,
                train_losses,
                validation_losses,
                epochs=1):

    """
    Trains a neural network. 
    Args:
        model               - model to be trained
        optimizer           - optimizer used for training
        train_loader        - loader from which data for training comes 
        validation_loader   - loader from which data for validation comes (maybe at the end, you use test_loader)
        train_losses        - adding train loss value to this list for future analysis
        validation_losses   - adding validation loss value to this list for future analysis
        epochs              - number of runs over the entire data set 
    """

    #----------------------------------------------
    # implementation needed here 
    #----------------------------------------------

    for epoch in range(epochs):
        train_epoch_loss, train_epoch_accuracy = fit(model, train_loader)
        val_epoch_loss, val_epoch_accuracy = validate_model(model, validation_loader)
        train_losses.append(train_epoch_loss)
        train_accuracy.append(train_epoch_accuracy)
        validation_losses.append(val_epoch_loss)
        val_accuracy.append(val_epoch_accuracy)

    return

当我运行以下代码时,出现运行时错误:

train_model(model, 
            optimizer,
            train_loader, 
            validation_loader, 
            train_losses, 
            validation_losses,
            epochs=2)

错误:RuntimeError:标量类型为Double的预期对象,但得到了 标量类型浮点数,用于调用的参数#2'weight' _thnn_conv2d_forward

以下是错误消息的屏幕截图: ERROR

编辑:这就是我的模型的样子,我应该在labels.txt文件中检测具有给定中心和半径的图像中的圆并对其进行绘制-给出了绘制功能,我已经创建了模型以及培训和验证。

class CircleNet(nn.Module):    # nn.Module is parent class  
    def __init__(self):
        super(CircleNet, self).__init__()  #calls init of parent class
        #----------------------------------------------
        # implementation needed here 
        #----------------------------------------------
        #keep dimensions of input image: (I-F+2P)/S +1= (128-3+2)/1 + 1 = 128

        #RGB image = input channels = 3. Use 12 filters for first 2 convolution layers, then double
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(in_channels=24, out_channels=32, kernel_size=3, stride=1, padding=1)

        #Pooling to reduce sizes, and dropout to prevent overfitting
        self.pool = nn.MaxPool2d(kernel_size=2)
        self.relu = nn.ReLU()

        self.drop = nn.Dropout2d(p=0.25)
        self.norm1 = nn.BatchNorm2d(12)
        self.norm2 = nn.BatchNorm2d(24)

        # There are 2 pooling layers, each with kernel size of 2. Output size: 128/(2*2) = 32
        # Have 3 output features, corresponding to x-pos, y-pos, radius. 
        self.fc = nn.Linear(in_features=32 * 32 * 32, out_features=3)

    def forward(self, x):
        """
        Feed forward through network
        Args:
            x - input to the network

        Returns "x", which is the network's output
        """

        #----------------------------------------------
        # implementation needed here 
        #----------------------------------------------
        #Conv1
        out = self.conv1(x)
        out = self.pool(out)
        out = self.relu(out)
        out = self.norm1(out)
        #Conv2
        out = self.conv2(out)
        out = self.pool(out)
        out = self.relu(out)
        out = self.norm1(out)
        #Conv3
        out = self.conv3(out)
        out = self.drop(out)
        #Conv4
        out = self.conv4(out)
        out = F.dropout(out, training=self.training)
        out = out.view(-1, 32 * 32 * 32)
        out = self.fc(out)


        return out

编辑:这对我的自定义损失函数有帮助吗?

criterion = nn.CrossEntropyLoss()

def my_loss(outputs, labels):

    """
    Args:
        outputs - output of network ([batch size, 3]) 
        labels  - desired labels  ([batch size, 3])
    """

    loss = torch.zeros(1, dtype=torch.float, requires_grad=True)
    loss = loss.to(device)

    loss = criterion(outputs, labels)

    #----------------------------------------------
    # implementation needed here 
    #----------------------------------------------

    # Observe: If you need to iterate and add certain values to loss defined above
    # you cannot write: loss +=... because this will raise the error: 
    # "Leaf variable was used in an inplace operation"
    # Instead, to avoid this error write: loss = loss + ...       

    return loss

火车装载机(送给我):

train_dir      = "./train/"
validation_dir = "./validation/"
test_dir       = "./test/"


train_dataset = ShapesDataset(train_dir)

train_loader = DataLoader(train_dataset, 
                          batch_size=32,
                          shuffle=True)



validation_dataset = ShapesDataset(validation_dir)

validation_loader = DataLoader(validation_dataset, 
                               batch_size=1,
                               shuffle=False)



test_dataset = ShapesDataset(test_dir)

test_loader = DataLoader(test_dataset, 
                          batch_size=1,
                          shuffle=False)


print("train loader examples     :", len(train_dataset)) 
print("validation loader examples:", len(validation_dataset))
print("test loader examples      :", len(test_dataset))

编辑:该视图图像,目标圆标签和网络输出也已给出:

"""
View first image of a given number of batches assuming that model has been created. 
Currently, lines assuming model has been creatd, are commented out. Without a model, 
you can view target labels and the corresponding images.
This is given to you so that you may see how loaders and model can be used. 
"""

loader = train_loader # choose from which loader to show images
bacthes_to_show = 2
with torch.no_grad():
    for i, data in enumerate(loader, 0): #0 means that counting starts at zero
        inputs = (data['image']).to(device)   # has shape (batch_size, 3, 128, 128)
        labels = (data['labels']).to(device)  # has shape (batch_size, 3)
        img_fnames = data['fname']            # list of length batch_size

        #outputs = model(inputs.float())
        img = Image.open(img_fnames[0])

        print ("showing image: ", img_fnames[0])

        labels_str = [ float(("{0:.2f}".format(x))) for x in labels[0]]#labels_np_arr]

        #outputs_np_arr = outputs[0] # using ".numpy()" to convert tensor to numpy array
        #outputs_str = [ float(("{0:.2f}".format(x))) for x in outputs_np_arr]
        print("Target labels :", labels_str )
        #print("network coeffs:", outputs_str)
        print()
        #img.show()

        if (i+1) == bacthes_to_show:
            break

这是我得到的输出,应该覆盖整个圆圈: Output I am getting 任何想法都会有所帮助。

1 个答案:

答案 0 :(得分:0)

我基本上在(validate_model和fit函数中)添加了这个:

 _, target= torch.max(target.data, 1)
在代码行_, preds = torch.max(output.data, 1)下的

,以使数据和目标具有相同的长度。还将损失函数从CrossEntropyLoss更改为MSELoss

然后使用相同的功能: 我将以下行output = model(data)更改为output = model(data.float())