我有一个训练文件夹和一个包含图像的测试文件夹,但是我想使用SubsetRandomSampler添加一个验证集。火车文件夹中的图像以标准ImageFolder格式组织:
我尝试使用以下代码使用SubsetRandomSampler创建单独的有效数据集:
batch_size = 20
num_workers = 0
valid_size = 0.2
train_size = len(os.listdir(train_folder))
#print(train_size)
indices = list(range(train_size))
#print(indices)
np.random.shuffle(indices)
split = int(np.floor(valid_size * train_size))
train_idx, valid_idx = indices[split:], indices[:split]
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
class_to_idx = {classes[i]: i for i in range(len(classes))}
#print(class_to_idx)
training_transform = torchvision.transforms.Compose([
transforms.RandomRotation(75),
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.RandomHorizontalFlip(p=0.5),
#transforms.ColorJitter(brightness = 2, contrast = 0.5, saturation = 0.5),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
test_transform = torchvision.transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
train_data = torchvision.datasets.ImageFolder(train_folder, transform=training_transform)
train_data_loader = data.DataLoader(train_data, batch_size=batch_size, sampler= train_sampler , num_workers=num_workers)
for step, (tx, ty) in enumerate(train_data_loader, 0):
print('---train_set_tensors---', tx.shape, ty)
valid_data = torchvision.datasets.ImageFolder(train_folder, transform= test_transform)
valid_data_loader = data.DataLoader(valid_data, batch_size=batch_size, sampler = valid_sampler, num_workers= num_workers)
for step, (tx, ty) in enumerate(valid_data_loader, 0):
print('---valid_set_tensors---', tx.shape, ty)
test_data = torchvision.datasets.ImageFolder(test_folder, transform = test_transform)
test_data_loader = data.DataLoader(test_data, batch_size=batch_size, shuffle=True, num_workers= num_workers)
print('Test image information: ', test_data)
#for step, (tx, ty) in enumerate(test_data_loader, 0):
# print('---test_set_tensors---', tx.shape, ty)
当我使用
for step, (tx, ty) in enumerate(train_data_loader, 0):
print('---train_set_tensors---', tx.shape, ty)
我可以检查每个张量的结构。当我不使用SubsetRandomSampler并完全删除验证集时,我从火车集中获得了每个张量的输出(75,750)。但是,当使用前面的代码添加SubsetRandomSampler /时,我得到此输出。只是0,几乎没有任何输出:
Train image information: Dataset ImageFolder
Number of datapoints: 75750
Root location: food-101/train
StandardTransform
Transform: Compose(
RandomRotation(degrees=(-75, 75), resample=False, expand=False)
Resize(size=256, interpolation=PIL.Image.BILINEAR)
CenterCrop(size=(224, 224))
RandomHorizontalFlip(p=0.5)
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
---train_set_tensors--- torch.Size([20, 3, 224, 224]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
---train_set_tensors--- torch.Size([20, 3, 224, 224]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
---train_set_tensors--- torch.Size([20, 3, 224, 224]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
---train_set_tensors--- torch.Size([20, 3, 224, 224]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
---train_set_tensors--- torch.Size([1, 3, 224, 224]) tensor([0])
Valid image information: Dataset ImageFolder
Number of datapoints: 75750
Root location: food-101/train
StandardTransform
Transform: Compose(
Resize(size=256, interpolation=PIL.Image.BILINEAR)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
---valid_set_tensors--- torch.Size([20, 3, 224, 224]) tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
有些东西无法正常工作...首先,测试中的数据点数和有效数据集的数量是相同的,不应该这样。似乎没有信息正在加载,实际出现的一些张量输出均为0。当我在线比较SubsetRandomSampler的代码时,看不到任何明显的错误。我的代码怎么了?