Question

我正在尝试使用 pytorch 为分类任务加载数据集，这是我使用的代码：

Dim filePath = "file path here"

Using img = Image.FromFile(filePath)
    'Use img here.
End Using

File.Delete(filePath)

代码运行良好，但由于我的数据集是灰度的，我需要将其转换为 RGB，所以我使用了以下代码：

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomRotation(2.8),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.5), (0.5))
    ]),
    'valid': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize((0.5), (0.5))
    ])
}
print(os.listdir())
# TODO: Load the datasets with ImageFolder
image_datasets = {x: datasets.ImageFolder(os.path.join("/content/drive/MyDrive/DatasetPersonale", x),
                                          data_transforms[x])
                  for x in ['train', 'valid']}
# TODO: Using the image datasets and the trainforms, define the dataloaders
batch_size = 32
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'valid']}
class_names = image_datasets['train'].classes
print(class_names)
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid']}

现在我的图像仍然是 jpeg，但现在它们是 RGB 而不是 L。问题是，如果我重新运行代码以加载数据集，我会收到此错误


rootdir = '/content/drive/MyDrive/DatasetPersonale/trainRGB'
print("Train")
for subdir, dirs, files in os.walk(rootdir):
   for file in files:
        filePath = os.path.join(subdir, file)
        name = os.path.basename(filePath)
        img=Image.open(filePath, mode="r")
        print(img.mode)
        if img.mode != "RGB":
            RGBimg=img.convert("RGB")
            RGBimg.save(filePath,format=jpeg)

有人知道为什么会出现这个错误吗？我检查了所有文件的扩展名，它们是 jpeg。

谢谢。

Answer 1

问题：这是因为文件夹 .ipynb_checkpoints 中包含文件（无效图像）的 /content/drive/MyDrive/DatasetPersonale/trainRGB 文件夹无法作为具有有效扩展名（.jpg、 .jpeg、.png、.ppm、.bmp、.pgm、.tif、.tiff、.webp）。

解决方案：您可以将所有图像保存在一个子文件夹中，即“图像”，然后将根文件夹更改为 /content/drive/MyDrive/DatasetPersonale/trainRGB/images 以避免读取包含图像的 .ipynb_checkpoints 文件夹.

pytorch dataset.imageFolder 与 Google Colab 中自定义数据集的问题

1 个答案: