我有一组整数,比如说S = {1,...,10},还有两个矩阵N和M,它们的行是来自订单S的元素的一些(但不一定是所有可能的)排列,比如说例如,分别为3和5 N = [1 2 3; 2 5 3; ...],M = [1 2 3 4 5; 2 4 7 8 1; ...]。
置换P的子置换Q只是P的索引子集,使得Q的元素的索引的阶数与它们在P中的索引的阶数相同。例如:[2,4, []是[2,3,4,6,7,1]的子置换,但[1,2,3]不是后者的子置换。
我需要一种有效的方法(例如尽可能矢量化并尽可能小的for循环)来寻找
(1)来自M的所有排列,其具有来自N的子排列
和
(2)在M中找到N的每个子排列多少次。
到目前为止,我所拥有的是一个矢量化代码,用于检查M中是否包含给定的单个子置换(以及多少次),但是我必须使用通过N的parfor-loop,这对于非常大的Ns。注意,如果N不是太大,人们也可以通过简单地从给定的3元组构造允许的5元组并将结果与M进行比较来解决问题,但是如果N可以很快变得比简单的强制要慢得多。足够大了。
查看问题的另一种方法如下:检查其行的N个模数排列是否是一般意义上的M的子矩阵,即,是否可以通过删除来获得N行的排列来自M的元素。
道歉,如果我的问题太基础,我的背景来自算术代数几何和表示理论,我对MATLAB很新。
修改 这是我检查M中单个k元组的存在的代码: [代码]
function [A,f] = my_function(x,M)
%// returns all rows in M that contain x and the absolute frequency of x in M
%// suboptimal for checking combinations rather than permutations byy at least ~ 50%
k = size(x,2);
m = size(M,1);
R = zeros(m,k);
I = R;
Z = I;
for j = 1:k
[R(:,j),I(:,j)] = max((M == x(j)),[],2);
Z(:,j) = R(:,j).*I(:,j);
end
z = zeros(m,k-1);
for j = 1:(k-1)
z(:,j) = (Z(:,j) > 0 & Z(:,j) < Z(:,j+1));
end
[v,~] = find(sum(z,2) == k-1);
A = M(v,:);
f = length(v);
end
使用这个函数,检查N只是一个简单的(par)for循环问题,我希望避免使用更快的矢量化解决方案。
答案 0 :(得分:2)
方法#1
[val,ind] = max(bsxfun(@eq,permute(M,[4 2 1 3]),permute(N,[2 3 4 1])),[],2)
matches = squeeze(all(diff(ind,1)>0,1).*all(val,1))
out1 = any(matches,2) %// Solution - 1
out2 = sum(matches,1) %// Solution - 2
方法#2
另一种避免permuting N
的方法,对于长期N
可能更好 -
[val,ind] = max(bsxfun(@eq,N,permute(M,[3 4 1 2])),[],4)
matches = squeeze(all(diff(ind,[],2)>0,2).*all(val,2))
out1 = any(matches,1) %// Solution - 1
out2 = sum(matches,2) %// Solution - 2
方法#3
大型数据化的内存scroogey方法 -
out1 = false(size(M,1),1); %// Storage for Solution - 1
out2 = zeros(size(N,1),1); %// Storage for Solution - 2
for k=1:size(N,1)
[val3,ind3] = max(bsxfun(@eq,N(k,:),permute(M,[1 3 2])),[],3);
matches = all(diff(ind3,[],2)>0,2).*all(val3,2);
out1 = or(out1,matches);
out2(k) = sum(matches);
end
方法#4
GPU的内存 - scroogey方法 -
gM = gpuArray(M);
gN = gpuArray(N);
gout1 = false(size(gM,1),1,'gpuArray'); %// GPU Storage for Solution - 1
gout2 = zeros(size(gN,1),1,'gpuArray'); %// GPU Storage for Solution - 2
for k=1:size(gN,1)
[val3,ind3] = max(bsxfun(@eq,gN(k,:),permute(gM,[1 3 2])),[],3);
matches = all(diff(ind3,[],2)>0,2).*all(val3,2);
gout1 = or(gout1,matches);
gout2(k) = sum(matches);
end
out1 = gather(gout1); %// Solution - 1
out2 = gather(gout2); %// Solution - 2
现在,这种GPU方法已经吹走了所有其他方法。它与M : 320000X5
和N : 2100X3
(与输入大小相同)运行,填充了随机整数。使用GTX 750 Ti
,只需13.867873 seconds
!!因此,如果您拥有足够内存的GPU,这也可能是您的赢家方式。
方法#5
GPU的极其内存scroogey方法 -
gM = gpuArray(M);
gN = gpuArray(N);
gout1 = false(size(gM,1),1,'gpuArray'); %// GPU Storage for Solution - 1
gout2 = zeros(size(gN,1),1,'gpuArray'); %// GPU Storage for Solution - 2
for k=1:size(gN,1)
[val2,ind2] = max(bsxfun(@eq,gM,permute(gN(k,:),[1 3 2])),[],2);
matches = all(diff(ind2,[],3)>0,3).*all(val2,3);
gout1 = or(gout1,matches);
gout2(k) = sum(matches);
end
out1 = gather(gout1); %// Solution - 1
out2 = gather(gout2); %// Solution - 2
答案 1 :(得分:2)
这个怎么样?
n = size(N,1);
m = size(M,1);
p = size(N,2);
pattern = (1:p).'; %'// will be used for checking if it's a subpermutation or not
result = false(m,n); %// preallocate result, and initiallize to 0
for k = 1:size(N,1) %// loop over columns of (transposed) N
[~, loc] = ismember(M, N(k,:)); %// entries of M equal to a value of N(:,k)
ind = find(sum(loc>0,2)==p); %// we need p matches per row of M
t = reshape(nonzeros(loc(ind,:).'),p,[]); %'// for those rows, take matches
ind2 = all(bsxfun(@eq,t,pattern)); %// ... see if they are in the right order
result(ind(ind2),k) = true; %// ... and if so, take note in result matrix
end
如果 s -th行{{result
矩阵在 r , s 的位置包含1
1}}是 M 的 r 行的子置换。由此,您所需的结果
N
示例:
result1 = any(result,2);
result2 = sum(result,1);
给
M =
8 9 4 1 10
6 5 2 7 8
4 1 9 2 10
3 4 5 1 2
N =
4 1 2
4 9 10
3 5 9
答案 2 :(得分:1)
我针对不同的矩阵对N,M
对所有方法进行了基准测试,并且在可能的情况下,我还比较了parfor
与for
并选择了更快的方法。以下是我的结果:
%//Test 1: size(N) = 2263x3, size(M) = 5000x6
%//My approach (parfor): 0.650626 sec
%//Divakar's Approach 1: 1.870144 sec
%//Divakar's Approach 2: 1.164088 sec
%//Divakar's Approach 3: 0.380915 sec (with parfor)
%//Divakar's Approach 4: 2.643659 sec (gpu)
%//Luis Mendo's Approach: 1.681007 sec
%//Test 2: size(N) = 2263x3, size(M) = 25000x6
%//My approach (parfor): 6.137823 sec
%//Divakar's Approach 1: 8.342699 sec
%//Divakar's Approach 2: 5.784426 sec
%//Divakar's Approach 3: 2.251888 sec (with parfor)
%//Divakar's Approach 4: 6.272578 sec (gpu)
%//Luis Mendo's Approach: 11.514548 sec
%//Test 3: size(N) = 2100x3, size(M) = 20000x5
%//My approach (parfor): 3.417432 sec
%//Divakar's Approach 1: 5.732680 sec
%//Divakar's Approach 2: 4.107909 sec
%//Divakar's Approach 3: 1.393052 sec (with parfor)
%//Divakar's Approach 4: 3.145183 sec (gpu)
%//Luis Mendo's Approach: 5.668326 sec
%//Test 4: size(N) = 2100x3, size(M) = 324632x5
%//Divakar's Approach 3: 54.231878 sec (with parfor)
%//Divakar's Approach 4: 15.111936 sec (gpu)
%//Test 5: size(N) = 2263x3, size(M) = 1000000x6
%//Divakar's Approach 3: 210.853515 sec (with parfor)
%//Divakar's Approach 4: 49.529794 sec (gpu)
%//Divakar's Approach 5: 49.874444 sec (gpu)
%//Test 6: size(N) = 2263x3, size(M) = 5000000x6
%//Divakar's Approach 3: 1137.606244 sec (with parfor)
%//Divakar's Approach 4: stopped it after 15 min and heavy interrupts/DCPs activity
%//Divakar's Approach 5: 267.169307 sec
在非gpu方法中,Divakar的方法3是迄今为止最快的方法。它的gpu对应物开始只显示其大量行的优势。