Question

我正在尝试在PyTorch中实现转移学习方法。这是我正在使用的数据集：Dog-Breed

这是我要遵循的步骤。

1. Load the data and read csv using pandas.
2. Resize (60, 60) the train images and store them as numpy array.
3. Apply stratification and split the train data into 7:1:2 (train:validation:test)
4. use the resnet18 model and train.

数据集的位置

LABELS_LOCATION = './dataset/labels.csv'
TRAIN_LOCATION = './dataset/train/'
TEST_LOCATION = './dataset/test/'
ROOT_PATH = './dataset/'

阅读CSV（labels.csv）

def read_csv(csvf):
    # print(pandas.read_csv(csvf).values)
    data=pandas.read_csv(csvf).values
    labels_dict = dict(data)
    idz=list(labels_dict.keys())
    clazz=list(labels_dict.values())
    return labels_dict,idz,clazz

我之所以这样做是因为我在接下来使用DataLoader加载数据时会提到一个约束。

def class_hashmap(class_arr):
    uniq_clazz = Counter(class_arr)
    class_dict = {}
    for i, j in enumerate(uniq_clazz):
        class_dict[j] = i
    return class_dict

labels, ids, class_names = read_csv(LABELS_LOCATION)
train_images = os.listdir(TRAIN_LOCATION)
class_numbers = class_hashmap(class_names)

接下来，我使用opencv将图像调整为60,60，并将结果存储为numpy数组。

resize = []
indexed_labels = []
for t_i in train_images:
    # resize.append(transform.resize(io.imread(TRAIN_LOCATION+t_i), (60, 60, 3)))  # (60,60) is the height and widht; 3 is the number of channels
    resize.append(cv2.resize(cv2.imread(TRAIN_LOCATION+t_i), (60, 60)).reshape(3, 60, 60))
    indexed_labels.append(class_numbers[labels[t_i.split('.')[0]]])

resize = np.asarray(resize)
print(resize.shape)

在indexed_labels中，我给每个标签一个数字。

接下来，我将数据拆分为7：1：2部分

X = resize  # numpy array of images [training data]
y = np.array(indexed_labels)  # indexed labels for images [training labels]

sss = StratifiedShuffleSplit(n_splits=3, test_size=0.2, random_state=0)
sss.get_n_splits(X, y)


for train_index, test_index in sss.split(X, y):
    X_temp, X_test = X[train_index], X[test_index]  # split train into train and test [data]
    y_temp, y_test = y[train_index], y[test_index]  # labels

sss = StratifiedShuffleSplit(n_splits=3, test_size=0.123, random_state=0)
sss.get_n_splits(X_temp, y_temp)

for train_index, test_index in sss.split(X_temp, y_temp):
    print("TRAIN:", train_index, "VAL:", test_index)
    X_train, X_val = X[train_index], X[test_index]  # training and validation data
    y_train, y_val = y[train_index], y[test_index]  # training and validation labels

接下来，我将上一步中的数据加载到火炬DataLoaders

中

batch_size = 500
learning_rate = 0.001

train = torch.utils.data.TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=False)

val = torch.utils.data.TensorDataset(torch.from_numpy(X_val), torch.from_numpy(y_val))
val_loader = torch.utils.data.DataLoader(val, batch_size=batch_size, shuffle=False)

test = torch.utils.data.TensorDataset(torch.from_numpy(X_test), torch.from_numpy(y_test))
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=False)

# print(train_loader.size)

dataloaders = {
    'train': train_loader,
    'val': val_loader
}

接下来，我加载预训练的rensnet模型。

model_ft = models.resnet18(pretrained=True)

# freeze all model parameters
# for param in model_ft.parameters():
#     param.requires_grad = False

num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, len(class_numbers))

if use_gpu:
    model_ft = model_ft.cuda()
    model_ft.fc = model_ft.fc.cuda()

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25)

然后我使用了train_model，这是PyTorch文档中描述的here方法。

然而，当我运行此操作时，我收到错误。

Traceback (most recent call last):
  File "/Users/nirvair/Sites/pyTorch/TL.py",
    line 244, in <module>
        num_epochs=25)
      File "/Users/nirvair/Sites/pyTorch/TL.py", line 176, in train_model
        outputs = model(inputs)
      File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/Library/Python/2.7/site-packages/torchvision/models/resnet.py", line 149, in forward
        x = self.avgpool(x)
      File "/Library/Python/2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/Library/Python/2.7/site-packages/torch/nn/modules/pooling.py", line 505, in forward
        self.padding, self.ceil_mode, self.count_include_pad)
      File "/Library/Python/2.7/site-packages/torch/nn/functional.py", line 264, in avg_pool2d
        ceil_mode, count_include_pad)
      File "/Library/Python/2.7/site-packages/torch/nn/_functions/thnn/pooling.py", line 360, in forward
        ctx.ceil_mode, ctx.count_include_pad)
    RuntimeError: Given input size: (512x2x2). Calculated output size: (512x0x0). Output size is too small at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/THNN/generic/SpatialAveragePooling.c:64

我似乎无法弄清楚这里出了什么问题。

Answer 1

您的网络太深，无法满足您正在使用的图像尺寸（60x60）。如您所知，当输入图像通过图层传播时，CNN图层会生成越来越小的要素图。这是因为您没有使用填充。

您所犯的错误只是说下一层需要512个大小为2像素乘2像素的要素图。从正向传递产生的实际特征映射是512个大小为0x0的映射。这种不匹配是导致错误的原因。

通常，所有库存网络（例如RESNET-18，Inception等）都要求输入图像的大小为224x224（至少）。使用torchvision transforms [1]可以更轻松地完成此操作。您也可以使用较大的图像尺寸，但AlexNet具有一个例外，它具有硬编码的特征向量大小，如我在[2]中的回答中所述。

额外提示：如果您在预设模式下使用网络，则需要使用[3]中pytorch文档中的参数对数据进行白化。

<强>链接

使用PyTorch转移学习[resnet18]。数据集：犬种识别

1 个答案: