Question

我有一个包含400张图片的文件。我想要的是将这个文件分成两个文件：train_images和test_images。

train_images应包含随机选择的150张图像，所有这些图像必须彼此不同。然后，test_images还应包含随机选择的150张图像，并且应该彼此不同，即使是在文件train_images中选择的图像也是如此。

我首先编写一个代码，旨在从Faces文件中选择随机数量的图像并将它们放在train_images文件中。我需要你的帮助才能回应我上面描述的行为。

clear all;
close all;
clc;


 Train_images='train_faces';
 mkdir(Train_images);


ImageFiles = dir('Faces');
   totalNumberOfImages = length(ImageFiles)-1;
   scrambledList = randperm(totalNumberOfImages);
   numberIWantToUse = 150;
   loop_counter = 1;
   for index = scrambledList(1:numberIWantToUse)
        baseFileName = ImageFiles(index).name;
        str = fullfile('faces', baseFileName); % Better than STRCAT

        face = imread(str);

        imwrite( face, fullfile(Train_images, ['hello' num2str(index) '.jpg']));

        loop_counter = loop_counter + 1;
   end

非常感谢任何帮助。

Answer 1

假设您拥有Bioinformatics工具箱，您可以使用参数HoldOut {/ 3}}来使用crossvalind：

这是一个例子。 train和test是逻辑数组，因此您可以使用find来获取实际索引：

ImageFiles = dir('Faces');
ImageFilesIndexes = ones(1,length(ImageFiles )) %Use a numeric array instead the char array
proportion = 150/400; %Testing set
[train,test] = crossvalind('holdout',ImageFilesIndexes,proportion );
training_files = ImageFiles(train); %250 files: It is better to use more data to train
testing_files = ImageFiles(test); %150 files

%Then do whatever you like with the files

其他可能性有dividerand（神经网络工具箱）和cvpartition（统计工具箱）

Answer 2

您的代码对我来说很好。实施测试后，您可以重新运行scrambledList = randperm(totalNumberOfImages);，然后像在培训过程中一样选择scrambledList中的前150个元素。

您也可以直接重新初始化循环：

for index = scrambledList(numberIWantToUse+1 : 2*numberIWantToUse)
   ... % same thing you wrote in your training loop

end

采用这种方法，您的测试样本将与训练样本完全不同。

从文件中随机选择图像

2 个答案: