Question

我正在寻找一种快速算法来解决Python中的以下问题：

我有一组长度为M

的布尔矢量

B[1],B[2]...B[N]其中length(B[i]) == M和B[i] in [0,1]所有i

我想找到几乎互斥的所有大小为K <= N的子集。也就是说，我想枚举向量subset_idx的索引，以便：

len(subset_idx) <= K（最多K个向量的子集）
sum(B[subset_idx), axis = 0) <= 1至少M - T次（互相排斥）
sum(B[subset_idx), axis = 0) > 1最多T次（不相互排斥）

下面：

T是一个“软糖因素”＃34;满足互斥要求。如果T = 0，则subset_idx中的向量必须互斥。
相互排斥，我的意思是B[i,k] && B[j,k] = 0中i, j的所有向量subset_idx几乎适用于所有情况（即M - T个案例）。

例如，我添加了一个使用下面的numpy和itertools的草率实现。我认为算法和实现都可以改进。我正在寻找更好的算法，但任何与实现相关的指针都会受到赞赏。

import numpy as np
import itertools

K = 3       #max size
T = 0       #fudge factor for mutual exclusivity requirement

B = np.array(
    [[1, 1, 0, 1, 0],
     [0, 0, 1, 1, 0],
     [1, 1, 0, 0, 1],
     [1, 1, 1, 1, 1],
     [0, 0, 0, 0, 0]]
)

N, M = B.shape

# if you want to generate vectors randomly, use:
#
#  np.random.seed(seed=1)
# N = 10      #number of vectors
# M = 20     #length of each vectors
# B = np.random.choice([0, 1], size = (N, M))


mutually_exclusive_subsets = []

#identify subsets that are almost mutually exclusive via exhaustive search
for subset_size in range(2, K):
    for subset_idx in itertools.combinations(range(N), subset_size):
        n_exclusive = np.sum(np.sum(B[subset_idx,], axis=0) <= 1)
        if n_exclusive >= (M - T):
            mutually_exclusive_subsets.append(subset_idx)

print mutually_exclusive_subsets
# >>> [(0, 4), (1, 2), (1, 4), (2, 4), (3, 4)]


#test that each subset identified is *almost* mutually exclusive
for exclusive_idx in mutually_exclusive_subsets:
    assert(np.sum(np.sum(B[exclusive_idx,], axis = 0) > 1) <= T)

快速算法，用于查找几乎互斥的布尔向量集

0 个答案: