Question

我正在实施Apriori算法，此时我仍然坚持创建3组词

假设我有这样的2个单词的列表

FI2 = [('a','b'),('a','c'),('a','d'),('b','d'),('b','e'),('e','f')];

我通过将所有元素分成1个单词并使用3的itertools.combinations进行第一种方法，这是计算耗费而非正确的方法，因为结果应该是来自C2的子集

应该是这个结果

C3 = [('a','b','c'),('a','b','d'),('a','c','d'),('b','d','e')]

我遇到了如何处理此问题的问题。我很感激如何给我一些指导如何做到这一点

Answer 1

任何机会C3都缺少某些值？（'b'，'e'，'f'），（'a'，'b'，'e'）

我确定这不是最好的方式，但它是一个开始：

from itertools import combinations 

FI2 = [('a','b'),('a','c'),('a','d'),('b','d'),('b','e'),('e','f')]

# check if two tuples have at least one var in common
check_intersection = (lambda c: len(set(c[0]).intersection(set(c[1]))) > 0)

# run on all FI2 pairs combinations
# if two tuples have at least one var in common, a merged tuple is added
# remove the duplicates tuples from the new list
C3 = list(set([tuple(sorted(set(c[0] + c[1])))for c in combinations(FI2,2) if check_intersection(c)]))



print(C3)
#=> [('b', 'd', 'e'), ('a', 'b', 'e'), ('b', 'e', 'f'), ('a', 'b', 'd'), ('a','c','d'), ('a', 'b', 'c')]

Apriori从2集创建3组词

1 个答案: