快速算法,用于查找几乎互斥的布尔向量集

时间:2017-03-29 14:41:22

标签: python algorithm numpy combinatorics

我正在寻找一种快速算法来解决Python中的以下问题:

我有一组长度为M

的布尔矢量

B[1],B[2]...B[N]其中length(B[i]) == MB[i] in [0,1]所有i

我想找到几乎互斥的所有大小为K <= N的子集。也就是说,我想枚举向量subset_idx的索引,以便:

  • len(subset_idx) <= K(最多K个向量的子集)
  • sum(B[subset_idx), axis = 0) <= 1至少M - T次(互相排斥)
  • sum(B[subset_idx), axis = 0) > 1最多T次(不相互排斥)

下面:

  • T是一个“软糖因素”#34;满足互斥要求。如果T = 0,则subset_idx中的向量必须互斥。

  • 相互排斥,我的意思是B[i,k] && B[j,k] = 0i, j的所有向量subset_idx几乎适用于所有情况(即M - T个案例)。

例如,我添加了一个使用下面的numpyitertools的草率实现。我认为算法和实现都可以改进。我正在寻找更好的算法,但任何与实现相关的指针都会受到赞赏。

import numpy as np
import itertools

K = 3       #max size
T = 0       #fudge factor for mutual exclusivity requirement

B = np.array(
    [[1, 1, 0, 1, 0],
     [0, 0, 1, 1, 0],
     [1, 1, 0, 0, 1],
     [1, 1, 1, 1, 1],
     [0, 0, 0, 0, 0]]
)

N, M = B.shape

# if you want to generate vectors randomly, use:
#
#  np.random.seed(seed=1)
# N = 10      #number of vectors
# M = 20     #length of each vectors
# B = np.random.choice([0, 1], size = (N, M))


mutually_exclusive_subsets = []

#identify subsets that are almost mutually exclusive via exhaustive search
for subset_size in range(2, K):
    for subset_idx in itertools.combinations(range(N), subset_size):
        n_exclusive = np.sum(np.sum(B[subset_idx,], axis=0) <= 1)
        if n_exclusive >= (M - T):
            mutually_exclusive_subsets.append(subset_idx)

print mutually_exclusive_subsets
# >>> [(0, 4), (1, 2), (1, 4), (2, 4), (3, 4)]


#test that each subset identified is *almost* mutually exclusive
for exclusive_idx in mutually_exclusive_subsets:
    assert(np.sum(np.sum(B[exclusive_idx,], axis = 0) > 1) <= T)

0 个答案:

没有答案