如何对图像及其遮罩进行随机排序?

时间:2019-03-05 21:35:42

标签: python split pytorch image-segmentation

在分割数据集之前,我需要随机加载数据,然后进行分割。这是分割数据集的片段,该数据集不是随机的。我想知道如何针对folder_mask中的图像和相应的遮罩执行此操作?

folder_data = glob.glob("D:\\Neda\\Pytorch\\U-net\\my_data\\imagesResized\\*.png")
folder_mask = glob.glob("D:\\Neda\\Pytorch\\U-net\\my_data\\labelsResized\\*.png") 

 # split these path using a certain percentage
  len_data = len(folder_data)
  print("count of dataset: ", len_data)
  # count of dataset:  992


  split_1 = int(0.6 * len(folder_data))
  split_2 = int(0.8 * len(folder_data))

  #folder_data.sort()

  train_image_paths = folder_data[:split_1]
  print("count of train images is: ", len(train_image_paths)) 

  valid_image_paths = folder_data[split_1:split_2]
  print("count of validation image is: ", len(valid_image_paths))

  test_image_paths = folder_data[split_2:]
  print("count of test images is: ", len(test_image_paths)) 

   train_mask_paths = folder_mask[:split_1]
   valid_mask_paths = folder_mask[split_1:split_2]
   test_mask_paths = folder_mask[split_2:]

   train_dataset = CustomDataset(train_image_paths, train_mask_paths)
   train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=1, 
   shuffle=True, num_workers=2)

   valid_dataset = CustomDataset(valid_image_paths, valid_mask_paths)
   valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=1, 
   shuffle=True, num_workers=2)

    test_dataset = CustomDataset(test_image_paths, test_mask_paths)
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1, 
     shuffle=False, num_workers=2)  

     dataLoaders = {
          'train': train_loader,
          'valid': valid_loader,
           'test': test_loader,
                 }

3 个答案:

答案 0 :(得分:1)

据我了解,您希望随机化图片的顺序,以便每次重新运行时,火车和测试集中都有不同的照片。假设您想在或多或少的普通Python中执行此操作,则可以执行以下操作。

使用洗牌python中的元素列表的最简单方法是:

import random
random.shuffle(list)  // shuffles in place

因此,您必须列出并希望仍然保留数据和掩码之间的链接。因此,如果您可以接受较快的破解,我会建议这样的事情。

import random

folder_data = glob.glob("D:\\Neda\\Pytorch\\U-net\\my_data\\imagesResized\\*.png")
folder_mask = glob.glob("D:\\Neda\\Pytorch\\U-net\\my_data\\labelsResized\\*.png") 

assert len(folder_data) == len(folder_mask) // everything else would be bad

indices = list(range(len(folder_data)))
random.shuffle(indices)

现在您有了一个可以拆分的索引列表,然后使用拆分后的列表中的索引来访问其他列表。

split_1 = int(0.6 * len(folder_data))
split_2 = int(0.8 * len(folder_data))

train_image_paths = [folder_data[i] for i in indices]
// and so on...

这将是普通的Python方式。但是在sklearn之类的程序包中有执行此操作的功能。因此,您可以考虑使用那些。他们会节省您的大量工作。 (通常,重用代码然后自己实现是更好的方法。)

答案 1 :(得分:0)

尝试使用sklearn.model_selection.train_test_split

from sklearn.model_selection import train_test_split 

train_image_paths, test_image_paths, train_mask_paths, test_mask_paths 
  = train_test_split(folder_data, folder_mask, test_size=0.2)

这会将您的数据和标签分为相应的训练和测试集。如果您需要验证集,则可以使用两次-即首先将其分为训练/测试,然后再在训练子集上再次将其分为训练/验证。

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

答案 2 :(得分:0)

如果我做对了,每个样品都有一个口罩,对吗?使用熊猫将数据和掩码配对,然后使用帮助功能将它们随机分割:

import glob
import pandas as pd
import numpy as np

def train_validate_test_split(df, train_percent=.6, validate_percent=.2, seed=None):
    np.random.seed(seed)
    perm = np.random.permutation(df.index)
    m = len(df.index)
    train_end = int(train_percent * m)
    validate_end = int(validate_percent * m) + train_end
    train = df.ix[perm[:train_end]]
    validate = df.ix[perm[train_end:validate_end]]
    test = df.ix[perm[validate_end:]]
    return train, validate, test

folder_data = glob.glob("D:\\Neda\\Pytorch\\U-net\\my_data\\imagesResized\\*.png")
folder_mask = glob.glob("D:\\Neda\\Pytorch\\U-net\\my_data\\labelsResized\\*.png")

data_mask = pd.DataFrame({"data": folder_data, "mask": folder_mask})

train, validate, test = train_validate_test_split(data_mask)

来自this question中的@piRSquared答案的帮助功能的积分

相关问题