Question

我想使用10倍交叉验证方法，该方法测试哪种多项式形式（第一，第二，或第三顺序）给出了更好的契合度。我想将我的数据集划分为10个子集，并从10个数据集中删除1个子集。导出没有此子集的回归模型，使用派生回归模型预测该子集的输出值，并计算残差。最后，重复每个子集的计算例程，并对得到的残差的平方求和。我已经在Matlab 2013b上编写了以下代码，它对数据进行采样并测试训练数据的回归。我坚持如何为每个子集重复这个，以及如何比较哪个多项式形式更适合。

% Sample the data
parm = [AT];
n = length(parm);
k = 10;                 % how many parts to use
allix = randperm(n);    % all data indices, randomly ordered
numineach = ceil(n/k);  % at least one part must have this many data points
allix = reshape([allix NaN(1,k*numineach-n)],k,numineach);
for p=1:k
testix = allix(p,:);            % indices to use for testing
testix(isnan(testix)) = [];     % remove NaNs if necessary
trainix = setdiff(1:n,testix);  % indices to use for training
%train = parm(trainix); %gives the training data
%test = parm(testix);  %gives the testing data
end 

% Derive regression on the training data 
Sal = Salinity(trainix);
Temp = Temperature(trainix);
At = parm(trainix);

xyz =[Sal Temp At];
% Fit a Polynomial Surface
surffit = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
% Shows equation, rsquare, rmse 
[b,bint,r] = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');

Answer 1

关于为每个子集执行代码，您可以将拟合放在循环中并存储结果，例如

% Sample the data
parm = [AT];
n = length(parm);
k = 10;                 % how many parts to use
allix = randperm(n);    % all data indices, randomly ordered
numineach = ceil(n/k);  % at least one part must have this many data points
allix = reshape([allix NaN(1,k*numineach-n)],k,numineach);

bAll = []; bintAll = []; rAll = [];

for p=1:k
    testix = allix(p,:);            % indices to use for testing
    testix(isnan(testix)) = [];     % remove NaNs if necessary
    trainix = setdiff(1:n,testix);  % indices to use for training
    %train = parm(trainix); %gives the training data
    %test = parm(testix);  %gives the testing data

    % Derive regression on the training data 
    Sal = Salinity(trainix);
    Temp = Temperature(trainix);
    At = parm(trainix);

    xyz =[Sal Temp At];
    % Fit a Polynomial Surface
    surffit = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');
    % Shows equation, rsquare, rmse 
    [b,bint,r] = fit([xyz(:,1), xyz(:,2)],xyz(:,3), 'poly11');

    bAll = [bAll, coeffvalues(b)]; bintAll = [bintAll,bint]; rAll = [rAll,r]; 
end

关于最佳匹配，你可能会选择最低的rmse。

多项式回归的10倍交叉验证

1 个答案: