我使用
制作随机2d数组import numpy as np
from itertools import combinations
n = 50
A = np.random.randint(2, size=(n,n))
我想确定矩阵是否有两对行,它们总和到同一行向量。我正在寻找一种快速的方法来做到这一点。我目前的方法只是尝试所有可能性。
for pair in combinations(combinations(range(n), 2), 2):
if (np.array_equal(A[pair[0][0]] + A[pair[0][1]], A[pair[1][0]] + A[pair[1][1]] )):
print "Pair found", pair
适用于n = 100
的方法非常棒。
答案 0 :(得分:4)
根据您问题中的代码,并假设您实际上正在寻找总和等于相同行向量的对行对,您可以执行类似的操作这样:
def findMatchSets(A):
B = A.transpose()
pairs = tuple(combinations(range(len(A[0])), 2))
matchSets = [[i for i in pairs if B[0][i[0]] + B[0][i[1]] == z] for z in range(3)]
for c in range(1, len(A[0])):
matchSets = [[i for i in block if B[c][i[0]] + B[c][i[1]] == z] for z in range(3) for block in matchSets]
matchSets = [block for block in matchSets if len(block) > 1]
if not matchSets:
return []
return matchSets
这基本上将矩阵分层为等价集,在考虑一列后总和为相同的值,然后是两列,然后是三列,依此类推,直到它到达最后一列或没有等价集留下一个以上的成员(即没有这样的一对)。这对于100x100阵列工作正常,主要是因为当n很大时(n *(n + 1)/ 2组合与3 ^ n个可能的矢量和相比,两对行相加到同一行向量的几率非常小)
<强>更新强>
更新了代码,允许按要求搜索所有行的n个大小的子集对。根据原始问题,默认值为n = 2:
def findMatchSets(A, n=2):
B = A.transpose()
pairs = tuple(combinations(range(len(A[0])), n))
matchSets = [[i for i in pairs if sum([B[0][i[j]] for j in range(n)]) == z] for z in range(n + 1)]
for c in range(1, len(A[0])):
matchSets = [[i for i in block if sum([B[c][i[j]] for j in range(n)]) == z] for z in range(n + 1) for block in matchSets]
matchSets = [block for block in matchSets if len(block) > 1]
if not matchSets:
return []
return matchSets
答案 1 :(得分:4)
这是一个纯粹的numpy解决方案;没有广泛的时间,但我必须先将n推高到500才能看到我的光标在完成之前闪烁一次。虽然它是内存密集型的,并且由于更大n的内存需求而会失败。无论哪种方式,我都有这样的直觉:无论如何,找到这样一个向量的几率会随着大数n而减小。
import numpy as np
n = 100
A = np.random.randint(2, size=(n,n)).astype(np.int8)
def base3(a):
"""
pack the last axis of an array in base 3
40 base 3 numbers per uint64
"""
S = np.array_split(a, a.shape[-1]//40+1, axis=-1)
R = np.zeros(shape=a.shape[:-1]+(len(S),), dtype = np.uint64)
for i in xrange(len(S)):
s = S[i]
r = R[...,i]
for j in xrange(s.shape[-1]):
r *= 3
r += s[...,j]
return R
def unique_count(a):
"""returns counts of unique elements"""
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), np.int)
np.add.at(count, inverse, 1)
return unique, count
def voidview(arr):
"""view the last axis of an array as a void object. can be used as a faster form of lexsort"""
return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1])
def has_pairs_of_pairs(A):
#optional; convert rows to base 3
A = base3(A)
#precompute sums over a lower triangular set of all combinations
rowsums = sum(A[I] for I in np.tril_indices(n,-1))
#count the number of times each row occurs by sorting
#note that this is not quite O(n log n), since the cost of handling each row is also a function of n
unique, count = unique_count(voidview(rowsums))
#print if any pairs of pairs exist;
#computing their indices is left as an excercise for the reader
return np.any(count>1)
from time import clock
t = clock()
for i in xrange(100):
print has_pairs_of_pairs(A)
print clock()-t
编辑:包括base-3打包;现在n = 2000是可行的,大约2gb的mem和几秒钟的处理
编辑:添加了一些时间;在我的i7m上,每次通话n = 100只需要5ms。
答案 2 :(得分:1)
您当前的代码不会测试总和为相同值的行对。
假设这实际上是你想要的,最好坚持纯粹的numpy。这将生成具有相等总和的所有行的索引。
import numpy as np
n = 100
A = np.random.randint(2, size=(n,n))
rowsum = A.sum(axis=1)
unique, inverse = np.unique(rowsum, return_inverse = True)
count = np.zeros_like(unique)
np.add.at(count, inverse, 1)
for p in unique[count>1]:
print p, np.nonzero(rowsum==p)[0]
答案 3 :(得分:1)
这是一种'懒惰'方法,使用'仅'4gb内存扩展到n = 10000,并且每次调用大约10s完成。最坏情况复杂度为O(n ^ 3),但对于随机数据,预期性能为O(n ^ 2)。乍一看,似乎你需要O(n ^ 3)操作;每行组合需要至少生产和检查一次。但我们不需要看整行。相反,我们可以在行对的比较中执行早期退出策略,一旦明确它们对我们没用;对于随机数据,我们可能会在考虑连续的所有列之前很久就得出这个结论。
import numpy as np
n = 10
#also works for non-square A
A = np.random.randint(2, size=(n*2,n)).astype(np.int8)
#force the inclusion of some hits, to keep our algorithm on its toes
##A[0] = A[1]
def base_pack_lazy(a, base, dtype=np.uint64):
"""
pack the last axis of an array as minimal base representation
lazily yields packed columns of the original matrix
"""
a = np.ascontiguousarray( np.rollaxis(a, -1))
init = np.zeros(a.shape[1:], dtype)
packing = int(np.dtype(dtype).itemsize * 8 / (float(base) / 2))
for columns in np.array_split(a, (len(a)-1)//packing+1):
yield reduce(
lambda acc,inc: acc*base+inc,
columns,
init)
def unique_count(a):
"""returns counts of unique elements"""
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), np.int)
np.add.at(count, inverse, 1) #note; this scatter operation requires numpy 1.8; use a sparse matrix otherwise!
return unique, count, inverse
def has_identical_row_sums_lazy(A, combinations_index):
"""
compute the existence of combinations of rows summing to the same vector,
given an nxm matrix A and an index matrix specifying all combinations
naively, we need to compute the sum of each row combination at least once, giving n^3 computations
however, this isnt strictly required; we can lazily consider the columns, giving an early exit opportunity
all nicely vectorized of course
"""
multiplicity, combinations = combinations_index.shape
#list of indices into combinations_index, denoting possibly interacting combinations
active_combinations = np.arange(combinations, dtype=np.uint32)
for packed_column in base_pack_lazy(A, base=multiplicity+1): #loop over packed cols
#compute rowsums only for a fixed number of columns at a time.
#this is O(n^2) rather than O(n^3), and after considering the first column,
#we can typically already exclude almost all rowpairs
partial_rowsums = sum(packed_column[I[active_combinations]] for I in combinations_index)
#find duplicates in this column
unique, count, inverse = unique_count(partial_rowsums)
#prune those pairs which we can exclude as having different sums, based on columns inspected thus far
active_combinations = active_combinations[count[inverse] > 1]
#early exit; no pairs
if len(active_combinations)==0:
return False
return True
def has_identical_triple_row_sums(A):
n = len(A)
idx = np.array( [(i,j,k)
for i in xrange(n)
for j in xrange(n)
for k in xrange(n)
if i<j and j<k], dtype=np.uint16)
idx = np.ascontiguousarray( idx.T)
return has_identical_row_sums_lazy(A, idx)
def has_identical_double_row_sums(A):
n = len(A)
idx = np.array(np.tril_indices(n,-1), dtype=np.int32)
return has_identical_row_sums_lazy(A, idx)
from time import clock
t = clock()
for i in xrange(10):
print has_identical_double_row_sums(A)
print has_identical_triple_row_sums(A)
print clock()-t
扩展到包括对行的三元组总和的计算,如上所述。对于n = 100,这仍然仅需要大约0.2s
编辑:一些清理; edit2:更多清理
答案 4 :(得分:0)
如果您只需确定是否存在这样的一对,您可以这样做:
exists_unique = np.unique(A.sum(axis=1)).size != A.shape[0]