将数据集拆分为两个子集,例如“train”和“test” 火车组包含80%的数据,测试集包含剩余的20%。
拆分意味着生成长度等于的逻辑索引 数据集中的观察数量,培训数量为1 样品,0为测试样品。
N =长度(data.x)
输出:名为idxTrain和idxTest的逻辑数组。
答案 0 :(得分:1)
这应该可以解决问题:
% Generate sample data...
data = rand(32000,1);
% Calculate the number of training entries...
train_off = round(numel(data) * 0.8);
% Split data into training and test vectors...
train = data(1:train_off);
test = data(train_off+1:end);
但是,如果您真的想依赖逻辑索引,可以按以下步骤操作:
% Generate sample data...
data = rand(32000,1);
data_len = numel(data);
% Calculate the number of training entries...
train_count = round(data_len * 0.8);
% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];
% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);
您还可以使用randsample function来获取提取中的一些随机性,但是每次运行脚本时,这都不会为您提供测试和训练元素的确切绘制数量:
% Generate sample data...
data = rand(32000,1);
% Generate a random true/false indexing with unequally weighted probabilities...
is_training = logical(randsample([0 1],32000,true,[0.2 0.8]));
% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);
您可以通过生成正确数量的测试和培训索引,然后使用基于randperm的索引对其进行混洗来避免此问题:
% Generate sample data...
data = rand(32000,1);
data_len = numel(data);
% Calculate the number of training entries...
train_count = round(data_len * 0.8);
% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];
% Shuffle the logical indexing...
is_training = is_training(randperm(32000));
% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);