Question

我正在尝试从Food101数据集中将大型数据集划分为训练/有效/测试集以进行图像分类

数据集的结构是这样的，并且所有图像都放在一个文件夹中

'',
'Structure:',
'----------',
'pec/',
'    images/',
'        <class_name>/',
'            <image_id>.jpg',
'    meta/',
'        classes.txt',
'        labels.txt',
'        test.json',
'        test.txt',
'        train.json',
'        train.txt',
'',
'All images can be found in the "images" folder and are organized per class. All',
'image ids are unique and correspond to the foodspotting.com review ids. 
'',
'The test/train splitting used in the experiment of our paper can be found in',
'the "meta" directory.', (edited) ```



I want to divide images dataset to train/valid/test  with the list of filenames given in train.txt and test.txt, which author used

火车形状，有效，测试列表：（101，600），（101，150），25250

在colab中，我运行以下代码


for x in range(train.shape[0]):
    for y in range(train.shape[1]):

     temp = train[x,y] + ".jpg"

     foldername = temp.split('/')[0]

     !mv /content/food-101/images/$temp /content/food101/train/$foldername/

通过在列表中获取文件名来运行嵌套循环来单独移动图像，因为要总共创建100100张图像，因此要花费大量时间来创建文件夹，

我有一份训练/有效和测试集的文件名列表，但是如何将它们放入文件夹，以便我们可以将其以pytorch图像文件夹格式提供给图像分类器（我的意思是训练/有效/测试集是三个不同的文件夹并且每个文件夹都有每个类的子文件夹）

请告诉我是否有人知道该怎么做，请在这里我真的需要您的帮助，谢谢：微笑：

Answer 1

对于解决方案，我似乎一直都错了，我不需要移动所有需要更改的图像，而是通过os模块以所需格式获取图像的路径

下面是执行此操作的代码。假设您的文件名列表在有效列表中

#for valid set 

v = valid.reshape(15150,)

or_fpath = '/content/food-101/images/' #path of original folder
cp_fpath = '/content/food101/valid/'   #path of destination folder

for y in tqdm(v):

 foldername = y.split('/')[0]

 img = y.split('/')[1] +'.jpg'

 ip_path = or_fpath+foldername
 op_path = cp_fpath+foldername

 if not os.path.exists(op_path):
   os.mkdir(op_path)

 os.rename(os.path.join(ip_path, img), os.path.join(op_path, img))

谢谢！

注意：如果您有更好的答案，请分享谢谢

给定每个图像集的图像文件名列表，分割大型数据集以训练/验证/测试目录？

1 个答案: