MATLAB中FOR循环的回归

时间:2013-10-28 19:34:13

标签: matlab for-loop regression

我有以下代码:

colBIN = {0.050, 0.055, 0.060, 0.065, 0.070, 0.075, 0.080, 0.085, 0.090, 0.095,0.1};

for i = 1 : length(colBIN)-1
    colBIN{i,2} = find(cols(:,1) <= cell2mat(colBIN(i+1,1)) & cols(:,1) > cell2mat(colBIN(i,1)));
end

rowBIN = {0.045, 0.046, 0.047, 0.048, 0.049, 0.050, 0.051, 0.052};

for i = 1 : length(rowBIN)-1
    rowBIN{i,2} = find(rows(:,1) <= cell2mat(rowBIN(i+1,1)) & rows(:,1) > cell2mat(rowBIN(i,1))); 
end

binCombos = cell(length(rowBIN)-1,length(colBIN)-1);

for m = 1 : length(rowBIN)-1
    for n = 1 : length(colBIN)-1
        binCombos{n,m} = intersect( rowBIN{m,2}(:,1),colBIN{n,2}(:,1));
    end
end


binRows = size(binCombos,1);
binCols = size(binCombos,2)-1;

j = j + 1;
for n = 1 : binRows; 
    for m = 1 : binCols;
       thisBin = binCombos{n,m}(:,:); 
       if isempty(thisBin)==0

       %polyfit
       quadmod = polyfit(x_vrbl(thisBin), y_vrbl(thisBin), 2);
       interval = 0.0:0.001:1;
       quadmodcurve = polyval(quadmod,interval); 
       [r2 rmse] = rsquare(y_vrbl(thisBin), quadmodcurve); 
       plot(x_vrbl(thisBin), y_vrbl(thisBin), '*', interval, quadmodcurve);
       xlabel('x_vrbl');
       ylabel('y_vrbl');
       axis([0,1,0,1]);
       header = ['R^2 =' num2str(r2),'coeffs:',num2str(quadmod)];
       title(header);
       saveas(gcf, sprintf('plot_%d.pdf', j));

       %residuals
       res = y_vrbl(thisBin) - quadmodcurve;
       plot(x_vrbl(thisBin),res,'+');
       header2 = ['residuals'];
       title(header2);
       saveas(gcf, sprintf('residuals_%d.pdf', j));

       end
       j = j + 1;
   end
end

说明/问题:

binCombos是二维单元阵列,每个单元具有不均匀数量的数据点。我将二次曲线拟合到每个唯一单元格的数据,并尝试(不成功)输出 R ^ 2值以及绘制残差

我认为问题与以下事实有关:polyval函数所需的'interval'与y_vrbl(thisBin)的数组大小在尝试查找rsquare时不匹配,同样也用于计算残差。例如,如果我设置interval = x_vrbl(thisBin),那么残差“工作”但是polyfit都搞砸了。

2 个答案:

答案 0 :(得分:0)

我的猜测是这应该有效:

quadmodcurve = polyval(quadmod,y_vrbl(thisBin)); 
[r2 rmse] = rsquare(y_vrbl(thisBin), quadmodcurve);
interval = 0.0:0.001:1;
quadmodcurve = polyval(quadmod,interval); 

为了确定拟合质量,您必须仅在样本的x值处评估多项式。为了绘制完整的多项式图,您需要以更多且规则间隔的x值来评估它。

答案 1 :(得分:0)

我设法使用http://dropproxy.com/f/4B6的数据和file exchange的rsquare函数运行代码 纠正一些错误之后:

d = importdata('sample_data.xlsx');
y_vrbl = d.data(:, 1);
x_vrbl = d.data(:, 2);
rows = d.data(:, 3);
cols = d.data(:, 4);

cb = {0.050, 0.055, 0.060, 0.065, 0.070, 0.075, 0.080, 0.085, 0.090, 0.095,0.1};

for i = 1 : length(cb)-1
    colBIN{i,2} = find(cols(:,1) <= cell2mat(cb(i+1)) & cols(:,1) > cell2mat(cb(i)));
end

rb = {0.045, 0.046, 0.047, 0.048, 0.049, 0.050, 0.051, 0.052};

for i = 1 : length(rb)-1
    rowBIN{i,2} = find(rows(:,1) <= cell2mat(rb(i+1)) & rows(:,1) > cell2mat(rb(i)));
end

binCombos = cell(length(rowBIN)-1,length(colBIN)-1);

for m = 1 : length(rowBIN)-1
    for n = 1 : length(colBIN)-1
        binCombos{n,m} = intersect( rowBIN{m,2}(:,1),colBIN{n,2}(:,1));
    end
end


binRows = size(binCombos,1);
binCols = size(binCombos,2)-1;

j = 1;
for n = 1 : binRows;
    for m = 1 : binCols;
        thisBin = binCombos{n,m}(:,:);
        if ~isempty(thisBin)

            % polyfit
            quadmod = polyfit(x_vrbl(thisBin), y_vrbl(thisBin), 2);

            % compute residuals and R²
            quadmodcurve = polyval(quadmod,y_vrbl(thisBin));
            [r2, rmse] = rsquare(y_vrbl(thisBin), quadmodcurve);
            res = y_vrbl(thisBin) - quadmodcurve;

            % plot fit
            interval = 0.0:0.001:1;
            quadmodcurve = polyval(quadmod,interval);
            plot(x_vrbl(thisBin), y_vrbl(thisBin), '*', interval, quadmodcurve);
            xlabel('x_vrbl');
            ylabel('y_vrbl');
            axis([0,1,0,1]);
            header = ['R^2 =' num2str(r2),'coeffs:',num2str(quadmod)];
            title(header);
            saveas(gcf, sprintf('plot_%d.pdf', j));

            % plot residuals
            plot(x_vrbl(thisBin),res,'+');
            header2 = ['residuals'];
            title(header2);
            saveas(gcf, sprintf('residuals_%d.pdf', j));

        end
        j = j + 1;
    end
end

这种拟合对我来说很好,除了在大多数情况下线性函数可能就够了,并且二次项不是必需的。

关于你剩下的问题:我不是使用R²进行非线性拟合的专家(见Coefficient of determination上的注释2),但你使用的实现对我来说似乎有点可疑。大多数时候输出为0的原因是max的第65行上的rsquare.m函数,它可以防止返回负值。由于多项式拟合确实包含常数项,因此将函数调用为

[r2, rmse] = rsquare(y_vrbl(thisBin), quadmodcurve, false);

似乎更合适,并导致R 2>在大多数情况下为0.9。

我的建议:检查R²是否是您的情况下适合度的正确度量,并检查该功能是否正确实现。 Matlab附带的功能可以开箱即用,但Matlab文件交换中的帖子没有质量保证。