我正在尝试检索两个大型matricies之间的完全匹配(特定于行)的索引。我有一个n x 61矩阵A,其中包含0到9的值和另一个n x 61矩阵B,而这里的每一行包含0到9的值,但大多数是NaN(矩阵B的每一行中只有2到8列包含实际数字)。矩阵A可以预计有150万到300万行,而矩阵B有大约0.2到50万行。以下是设置示例:
% create matrix a with random data
dataSample = [0 9];
numRows = 1000000;
numCols = 61;
A = randi(dataSample,numRows,numCols);
% create matrix B with random data
numRows = 100000;
numCols = 61;
numColsUse = 2:8;
dataRange = 0:9;
B = NaN(numRows,numCols);
for i = 1:size(B,1)
% randomly selet number of columns to fill
numColsFill = datasample(numColsUse,1);
% randomly select column index from available columns
colIdx = datasample([1:numCols],numColsFill);
% randomly select values from 0 to 9
numFill = datasample([0:9],numColsFill);
% insert numbers at respective column in matrix B
B(i,colIdx) = numFill;
end
我想比较矩阵A的每一行与整个矩阵B并找到精确匹配,其中矩阵B的数量与它们各自位置(列)处的矩阵A的数量相匹配 - 因此矩阵B中的NaN是被忽略。
我可以使用cellfun实现所需的结果,我在几个子集中对矩阵A进行切片,然后使用自定义函数将子集的行与矩阵B中的每一行进行比较,如下所示:
% put all rows of matrix B in single cell
cellB = {B};
% take subset of matrix A and convert to cell array
subA = A(1000:5000,:);
subA = num2cell(subA,2);
% prepare cellB to meet cellfun conditions
cellB = repmat(cellB, [size(subA,1) 1]);
% apply cellfun to retrieve index of each exact match
idxContainer = cellfun(@findMatch, cellB, subA, 'UniformOutput', false);
函数findMatch如下所示:
function [ idx ] = LTableEval( cellB, subA )
idxCheckLT = lt(cellB, repmat(subA, [size(cellB,1) 1]));
idxCheckGT = gt(cellB, repmat(subA, [size(cellB,1) 1]));
idxCheck = idxCheckLT + idxCheckGT;
idxSum = sum(idxCheck,2);
idx = find(idxSum == 0);
end
这种方法有效,但它看起来非常低效,特别是在RAM方面,因为cellfun要求所有输入具有相同的大小,因此相同数据集的乘法。关于如何以更有效的方式解决这个问题的任何想法?非常感谢!
答案 0 :(得分:0)
怎么样:
for br = 1:size(B,1)
abs_diff = abs(repmat(B(br,:),[size(A,1) 1]) - A);
abs_diff(isnan(abs_diff)) = 0;
match = abs_diff == 0;
ind = find(sum(match,2)==size(match,2));
matches{br} = [repmat(br,[length(ind) 1]) ind];
end
matches = cell2mat(matches');
答案 1 :(得分:0)
以下解决方案在mathworks论坛中提供给我:
matches = cell(size(B, 1), 1);
for Brow = 1:size(B, 1)
Bcols = find(~isnan(B(Brow, :)));
fdsmatchedrows = find(all(A(:, Bcols) == B(Brow, Bcols), 2));
matches{Brow} = [matchedrows, repmat(Brow, size(matchedrows))];
end
matches = cell2mat(matches);
这仅适用于R2016a及更高版本。或者,用以下行替换for循环中的第二行:
matches{Brow} = find(all(bsxfun(@eq, A(:, Bcols), B(Brow, Bcols)), 2));
结果与Jed提供的解决方案相同,但我认为通过使用bsxfun更快一点。希望有所帮助!