Question

我正在处理一个数据集，我需要在其中找到少于 20 个样本的类的准确性。所以首先我使用pytorch的ImageFolder来获取文件夹中的所有图片。

dataset = ImageFolder('/content/drive/MyDrive/data/Dataset/')

现在我使用少于 20 个样本的类：

def get_class_distribution(dataset_obj):
    count_dict = {k:0 for k,v in dataset_obj.class_to_idx.items()}
    
    for element in dataset_obj:
        y_lbl = element[1]
        y_lbl = idx2class[y_lbl]
        count_dict[y_lbl] += 1
            
    return count_dict
# print("Distribution of classes: \n", get_class_distribution(dataset))
class_distribution = get_class_distribution(dataset)

sampled_classes = [classes  for (classes, samples) in class_distribution.items() if samples <= 20]

我正确获得了类列表，但我怀疑如何进一步进行推理？如何将其转换/更新为 ImageFolder 以便我可以在以下代码中使用过滤后的数据集：

# Test model performance for classes with less than 20 samples.

y_pred_list = []
y_true_list = []
with torch.no_grad():
    for x_batch, y_batch in tqdm(data_loader):
        x_batch, y_batch = x_batch.to(device), y_batch.to(device)
        y_test_pred = model(x_batch)
        _, y_pred_tag = torch.max(y_test_pred, dim = 1)
        y_pred_list.append(y_pred_tag.cpu().numpy())
        y_true_list.append(y_batch.cpu().numpy())

Answer 1

不需要写第一个块
改用这个

test_data = datasets.ImageFolder('test/', transform=test_transforms)
data_loader = torch.utils.data.DataLoader(test_data, batch_size=16)

y_pred_list = []
accuracy = []
with torch.no_grad():
    for x_batch, y_batch in tqdm(data_loader):
        x_batch, y_batch = x_batch.to(device), y_batch.to(device)
        y_test_pred = model(x_batch)
        top_p, top_class = y_test_pred.topk(1, dim=1)
        equals = top_class == y_batch.view(*top_class.shape)
        accuracy += torch.mean(equals.type(torch.FloatTensor)).item()


print(accuracy/len(data_loader)*100) # this would print %

如何在 pytorch 中更新我的 ImageFolder 数据集？

1 个答案: