我有两个图像数据集:主题1-200
,每个主题都有c
(例如c=8
)个图像。现在我想将这两个数据集分成我的算法的训练和测试集。我通常希望在以下情况下这样做:
k1
张图片(k1<c)
进行培训和k2
张图片k2<c
和k2+k1<=c
)每个主题的测试。所以训练集= k1*200
和测试集= k2*200
。请记住k1+k2<=c
主题在训练集和测试集中完全重叠。 请注意由于我们在培训和测试集中使用相同的主题,k1
和k2
不得重叠,即假设k1=3
和k2=3
然后选择任何3
进行培训,并从每个主题中选择其他任何3
进行测试。因此,约束k1+k2<=c
是必要的。
t
个主题组成,测试集由其他200-t
个主题组成。训练和测试集中的受试者完全不重叠。随机选择每个k1
主题的(k1<c)
图片t1
进行培训,并为每个k2
主题200-t
图片进行测试。所以训练集= k1*t
和测试集= k2*(200-t)
。请注意,k1+k2
可能不等于c
。甚至k1=k2
(可能) 请注意由于我们在培训和测试集中使用不同的主题,k1
和k2
可能会重叠,并且约束k1+k2<=c
不是必需的
m
(例如m=470
)no。来自数据库的用于训练集的图像,使得至少i
(例如i=2
)否。每个主题的图像存在(i<c
)。然后训练集= m
图像。测试集将包含200*c-m
图像。我想在MATLAB中编写代码。任何帮助将不胜感激。 提前谢谢。
编辑我试图在MATLAB中实现它。我在这里给出代码:
%% Read the data
%% My data reads as follows:
Name Size Bytes Class Attributes
a_data 99x1 12672 cell
a_labels 1x99 792 double
c 1x1 8 double
card_a 11x2 176 double
unq_a_lab 1x11 88 double
% where a_data is my total dataset.
% Assume that it contains total 99 images.
% a_labels is the labels associated with the images.
% c is the minimum number of subjects present in a class
% c is calculated as min (card(subj1),card(subj2),.....)
% card_a is the cardinality of each class present in the database
% card_a = [1,2,3,4,......;10,9,11,9,.....] i.e. card of subj 1 = 10
% card of subj 2 = 9 ,...etc
% unq_a_labels : Number of unique subjects present in the database.
% Assume it to be 11 (as given).
%% CASE 1 COMPLETELY OVERLAPPING DATASET EQUAL SIZED PARTITIONS
% Split the dataset into randomly training and testing subsets
% trainset - each subject k1 images
% testset - eact subject k2 images
% bear in mind constraint : k1+k2<=c
% Total training set = k1*no. of subjects
% Total testing set = k2*no. of subjects
% Both training and testing sets (subjects) are completely overlapping
%split 1
k1 = 3;
%split 2
k2 = 3;
Train_data_a = cell(length(unq_a_lab)*k1,1);
Test_data_a = cell(length(unq_a_lab)*k2,1);
tr_a_labels = zeros(1,length(unq_a_lab)*k1);
tst_a_labels = zeros(1,length(unq_a_lab)*k2);
t1=0; t2=0;
for i=1:length(unq_a_lab)
id = randperm(c);
% split it into 1:k1 and k1+1:k2 points
for j=1:k1
Train_data_a{t1+j} = a_data{c*(i-1)+id(j)};
tr_a_labels(1,t1+j) = a_labels(c*(i-1)+id(j));
end
for j=1:k2
Test_data_a{t2+j} = a_data{c*(i-1)+id(j+k1)};
tst_a_labels(1,t2+j) = a_labels(c*(i-1)+id(j+k1));
end
t1 = t1+k1; t2 = t2+k2;
end
%% CASE 2 COMPLETELY NON-OVERLAPPING DATASETS EQUAL SIZED PARTITIONS
% Split the dataset into randomly training and testing subsets
% trainset - each subject k1 images
% testset - eact subject k2 images
% Total training set = k1* cardinality of Train Set
% Total testing set = k2* cardinality of Test Set
% cardinality of Train Set + cardinality of Test Set = Total cardinality of
% the database
% Both training and testing sets (subjects) are non-overlapping
% p1 = number of subjects in training set
% p2 = number of subjects in testing set
%split 1
k1 = 3;
%split 2
k2 = 3;
% size of the partitions
% p1 = number of classes in the training sets
% p2 = number of classes in the testing sets
size_p = length(unq_a_lab);
p1 = round((size_p-1)*rand);
p2 = size_p-p1;
Train_data_a = cell(p1*k1,1);
Test_data_a = cell(p2*k2,1);
tr_a_labels = zeros(1,p1*k1);
tst_a_labels = zeros(1,p2*k2);
t1=0; t2=0;
for i=1:length(unq_a_lab)
id = randperm(c);
% split it into 1:k1 and 1:k2 points
if i<=p1
for j=1:k1
Train_data_a{t1+j} = a_data{c*(i-1)+id(j)};
tr_a_labels(1,t1+j) = a_labels(c*(i-1)+id(j));
end
t1 = t1+k1;
end
if i>p1
for j=1:k2
Test_data_a{t2+j} = a_data{c*(i-1)+id(j)};
tst_a_labels(1,t2+j) = a_labels(c*(i-1)+id(j));
end
t2 = t2+k2;
end
end
进行随机化,以便从总受试者中选择p1
个受试者,并且休息形成p2
个受试者。
%split 1
k1 = 3;
%split 2
k2 = 3;
% size of the partitions
% p1 = number of classes in the training sets
% p2 = number of classes in the testing sets
size_p = length(unq_a_lab);
p1 = round((size_p-1)*rand);
p2 = size_p-p1;
Train_data_a = cell(p1*k1,1);
Test_data_a = cell(p2*k2,1);
tr_a_labels = zeros(1,p1*k1);
tst_a_labels = zeros(1,p2*k2);
x = randperm(length(unq_a_lab));
t1=0; t2=0;
for i=1:length(unq_a_lab)
id = randperm(c);
% split it into 1:k1 and 1:k2 points
if i<=p1
for j=1:k1
Train_data_a{t1+j} = a_data{c*(x(i)-1)+id(j)};
tr_a_labels(1,t1+j) = a_labels(c*(x(i)-1)+id(j));
end
t1 = t1+k1;
end
if i>p1
for j=1:k2
Test_data_a{t2+j} = a_data{c*(x(i)-1)+id(j)};
tst_a_labels(1,t2+j) = a_labels(c*(x(i)-1)+id(j));
end
t2 = t2+k2;
end
end
%% CASE 3 COMPLETELY NON OVERLAPPING DATASETS UNEQUAL SIZED PARTITIONS
%% Split the dataset into randomly training and testing subsets
% trainset - Total m images and each subject atleast having i=floor(m/p1) images
% testset - eact subject k2 images
% Total training set = m images
% Total testing set = k2*p2 images
% cardinality of Train Set + cardinality of Test Set = Total cardinality of
% the database
% Both training and testing sets (subjects) are non-overlapping
% size of the partitions
% p1 = number of classes in the training sets
% p2 = number of classes in the testing sets
size_p = length(unq_a_lab);
% p1 = round((size_p-1)*rand);
p1 = 6;
p2 = size_p-p1;
%split 1
m = 29;
min_reqd = floor(m/p1);
%split 2
k2 = 3;
Train_data_a = cell(m,1);
Test_data_a = cell(p2*k2,1);
tr_a_labels = zeros(1,m);
dummy_labels = tr_a_labels;
tst_a_labels = zeros(1,p2*k2);
x = randperm(length(unq_a_lab));
% filling up the first min_reqd for each class
t1=1;
for j=1:p1
idx = randperm(c);
idx = idx(1:min_reqd);
for k=1:min_reqd
dummy_labels(t1) = c*(x(j)-1)+idx(k);
t1 = t1+1;
end
end
% form the numberset
num_pack = zeros(1,c*p1);
t2=1;
for j=1:p1
for k=1:c
num_pack(1,t2) = c*(x(j)-1)+k;
t2 = t2+1;
end
end
% getting the indices that have not been already selected previously
% using the set difference operation
% setdiff(A,B) is the values of A that are not in B
new_a_labels = setdiff(num_pack,dummy_labels);
idx = randperm(length(new_a_labels));
% randomly selecting the left amount of values from the set difference
% subset
idx = new_a_labels(idx(1:m-(min_reqd*p1)));
% inserting the values into the matrix
dummy_labels(t1:t1+length(idx)-1) = idx;
% sorting the matrix
[val,idx] = sort(dummy_labels);
% rearranging the matrix
dummy_labels = dummy_labels(idx);
% using the indices of the dummy variables to get the training set and
% their corresponding labels
for i=1:m
Train_data_a{i} = a_data{dummy_labels(i)};
tr_a_labels(1,i) = a_labels(dummy_labels(i));
end
% getting the testing set as previously done in case 2
t2=0;
for i=1:length(unq_a_lab)
% Random selection of k2 points for the testing set
id = randperm(c);
if i>p1
for j=1:k2
Test_data_a{t2+j} = a_data{c*(x(i)-1)+id(j)};
tst_a_labels(1,t2+j) = a_labels(c*(x(i)-1)+id(j));
end
t2 = t2+k2;
end
end*
我相信我的CASE 1和2是正确的。如果有错,请指出我。我需要帮助CASE 3.完成案例3 但完全不确定。