我想使用10倍交叉验证方法,该方法测试哪种多项式形式(第一,第二,或 第三顺序)给出了更好的契合度。我想将我的数据集划分为10个子集,并从10个数据集中删除1个子集。导出没有此子集的回归模型,使用派生回归模型预测该子集的输出值,并计算残差。最后,重复每个子集的计算例程,并对得到的残差的平方求和。 我已经在Matlab 2013b上编写了以下代码,它对数据进行采样并测试训练数据的回归。我坚持如何为每个子集重复这个,以及如何比较哪个多项式形式更适合。
% Sample the data
parm = [AT];
n = length(parm);
k = 10; % how many parts to use
allix = randperm(n); % all data indices, randomly ordered
numineach = ceil(n/k); % at least one part must have this many data points
allix = reshape([allix NaN(1,k*numineach-n)],k,numineach);
for p=1:k
testix = allix(p,:); % indices to use for testing
testix(isnan(testix)) = []; % remove NaNs if necessary
trainix = setdiff(1:n,testix); % indices to use for training
%train = parm(trainix); %gives the training data
%test = parm(testix); %gives the testing data
end
% Derive regression on the training data
Sal = Salinity(trainix);
Temp = Temperature(trainix);
At = parm(trainix);
xyz =[Sal Temp At];
% Fit a Polynomial Surface
surffit = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
% Shows equation, rsquare, rmse
[b,bint,r] = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
答案 0 :(得分:0)
关于为每个子集执行代码,您可以将拟合放在循环中并存储结果,例如
% Sample the data
parm = [AT];
n = length(parm);
k = 10; % how many parts to use
allix = randperm(n); % all data indices, randomly ordered
numineach = ceil(n/k); % at least one part must have this many data points
allix = reshape([allix NaN(1,k*numineach-n)],k,numineach);
bAll = []; bintAll = []; rAll = [];
for p=1:k
testix = allix(p,:); % indices to use for testing
testix(isnan(testix)) = []; % remove NaNs if necessary
trainix = setdiff(1:n,testix); % indices to use for training
%train = parm(trainix); %gives the training data
%test = parm(testix); %gives the testing data
% Derive regression on the training data
Sal = Salinity(trainix);
Temp = Temperature(trainix);
At = parm(trainix);
xyz =[Sal Temp At];
% Fit a Polynomial Surface
surffit = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
% Shows equation, rsquare, rmse
[b,bint,r] = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
bAll = [bAll, coeffvalues(b)]; bintAll = [bintAll,bint]; rAll = [rAll,r];
end
关于最佳匹配,你可能会选择最低的rmse。