我正在寻找一种快速算法来解决Python中的以下问题:
我有一组长度为M
B[1],B[2]...B[N]
其中length(B[i]) == M
和B[i] in [0,1]
所有i
我想找到几乎互斥的所有大小为K <= N
的子集。也就是说,我想枚举向量subset_idx
的索引,以便:
len(subset_idx) <= K
(最多K
个向量的子集)sum(B[subset_idx), axis = 0) <= 1
至少M - T
次(互相排斥)sum(B[subset_idx), axis = 0) > 1
最多T
次(不相互排斥)下面:
T
是一个“软糖因素”#34;满足互斥要求。如果T = 0
,则subset_idx
中的向量必须互斥。
相互排斥,我的意思是B[i,k] && B[j,k] = 0
中i, j
的所有向量subset_idx
几乎适用于所有情况(即M - T
个案例)。
例如,我添加了一个使用下面的numpy
和itertools
的草率实现。我认为算法和实现都可以改进。我正在寻找更好的算法,但任何与实现相关的指针都会受到赞赏。
import numpy as np
import itertools
K = 3 #max size
T = 0 #fudge factor for mutual exclusivity requirement
B = np.array(
[[1, 1, 0, 1, 0],
[0, 0, 1, 1, 0],
[1, 1, 0, 0, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 0, 0]]
)
N, M = B.shape
# if you want to generate vectors randomly, use:
#
# np.random.seed(seed=1)
# N = 10 #number of vectors
# M = 20 #length of each vectors
# B = np.random.choice([0, 1], size = (N, M))
mutually_exclusive_subsets = []
#identify subsets that are almost mutually exclusive via exhaustive search
for subset_size in range(2, K):
for subset_idx in itertools.combinations(range(N), subset_size):
n_exclusive = np.sum(np.sum(B[subset_idx,], axis=0) <= 1)
if n_exclusive >= (M - T):
mutually_exclusive_subsets.append(subset_idx)
print mutually_exclusive_subsets
# >>> [(0, 4), (1, 2), (1, 4), (2, 4), (3, 4)]
#test that each subset identified is *almost* mutually exclusive
for exclusive_idx in mutually_exclusive_subsets:
assert(np.sum(np.sum(B[exclusive_idx,], axis = 0) > 1) <= T)