我正在寻找有关如何解决以下问题的建议优雅。虽然在我的具体案例中表现不是问题,但我很感激有关良好做法的评论。
提前致谢!
我试图根据某些逻辑平均矩阵行,同时忽略NaN值。我目前拥有的代码没有按照我想要的方式处理NaN值。
我的数据以下列方式构建:
以下是一个例子:
DATA = [...
180 NaN NaN 1.733
180 NaN NaN 1.703
200 0.720 2.117 1.738
200 0.706 2.073 1.722
200 0.693 2.025 1.723
200 NaN NaN 1.729
210 NaN NaN 1.820
210 NaN NaN 1.813
210 NaN NaN 1.805
240 NaN NaN 1.951
240 NaN NaN 1.946
240 NaN NaN 1.946
270 NaN NaN 2.061
270 NaN NaN 2.052
300 0.754 2.356 2.103
300 0.758 2.342 2.057
300 NaN NaN 2.066
300 NaN NaN 2.066 ];
所需的结果是一个矩阵,其中包含第一列中唯一的“bins”,其余部分表示“未被NaNs”,例如:
(0.720+0.706+0.693)/3=0.7063
- 请注意此列+ bin的除以3(而不是4)。以上是上述示例的理想结果:
RES = [...
180 NaN NaN 1.718
200 0.7063 2.072 1.728
210 NaN NaN 1.812
240 NaN NaN 1.948
270 NaN NaN 2.056
300 0.756 2.349 2.074 ];
这是我设法从几个来源编译的一些代码。它适用于仅包含NaN或数字的列+ bin。
nDataCols=size(DATA,2)-1;
[u,m,n] = unique(DATA(:,1));
sz = size(m);
N=accumarray(n,1,sz);
RES(length(u),nDataCols) = 0; %Preallocation
for ind1 = 1:nDataCols
RES(:,ind1)=accumarray(n,DATA(:,ind1+1),sz)./N;
end
RES= [u,RES];
以下是我目前的情况:
RES = [...
180 NaN NaN 1.718
200 NaN NaN 1.728
210 NaN NaN 1.812
240 NaN NaN 1.948
270 NaN NaN 2.056
300 NaN NaN 2.074 ];
答案 0 :(得分:5)
一种可能的方法:在第一列中查找更改(利用它已预先排序的事实)并将nanmean
应用于每个行块:
ind = find(diff([-inf; (DATA(:,1)); inf])~=0); %// value changed: start of block
r = arrayfun(@(n) nanmean(DATA(ind(n):ind(n+1)-1,:)), 1:numel(ind)-1, 'uni', 0);
RES = vertcat(r{:});
您可以通过显式循环替换arrayfun
。那may be faster,避免了细胞引入的开销:
ind = find(diff([-inf; (DATA(:,1)); inf])~=0); %// value changed: start of block
RES = zeros(numel(ind)-1, size(DATA,2)); %// preallocate
for n = 1:numel(ind)-1 %// loop over blocks
RES(n,:) = nanmean(DATA(ind(n):ind(n+1)-1,:));
end
您的方法也可以使用。您只需要使用accumarray
函数的句柄调用nanmean
。这不需要对第一列进行预先排序。
nDataCols = size(DATA,2)-1;
[u, ~, n] = unique(DATA(:,1));
RES = zeros(length(u), nDataCols); %// Preallocation
for ind1 = 1:nDataCols
RES(:,ind1) = accumarray(n, DATA(:,ind1+1), [], @nanmean);
end
RES = [u, RES];
答案 1 :(得分:0)
这是另一种解决方案,虽然效率极低。此外,输出数组会将所有NaN
值设置为0.我们只是说这对学术研究很有用。以下是我所做的步骤:
NaN
值accumarray
作为函数句柄运行mean
。accumarray
结果编入索引并转换回矩阵%// Step #1
num = unique(DATA(:,1));
%// Step #2
cells = mat2cell(DATA, size(DATA,1), ones(size(DATA,2),1));
%// Step #3
cellsAppend = cellfun(@(x) [DATA(:,1) x], cells(2:end), 'uni', false);
%// Step #4
cellsNonNaN = cellfun(@(x) x(~isnan(x(:,2)),:), cellsAppend , 'uni', false);
%// Step #5
cellsMean = cellfun(@(x) accumarray(x(:,1), x(:,2), [], @mean), cellsNonNaN, 'uni', false);
%// Step #6
selectCells = cellfun(@(x) x(num), append3, 'uni', false);
RES = [num cell2mat(selectCells)];
结果是:
RES =
180.0000 0 0 1.7180
200.0000 0.7063 2.0717 1.7280
210.0000 0 0 1.8127
240.0000 0 0 1.9477
270.0000 0 0 2.0565
300.0000 0.7560 2.3490 2.0730
正如你所看到的那样,效率非常低 - 特别是我拨打cellfun
电话的数量,但我仍然是一个学术范例!