与自定义比较器设置交集?

时间:2018-05-14 17:27:07

标签: python

我有一组字符串

set_a = {'abcd', 'efgh', 'ghij'}
set_b = {'abce', 'efgk', 'ghij'}

我想找到这两个集合之间的交集,但元素相等定义如下

def match(string_a, string_b, threshold=0.8):
    lcs_len = lcs(string_a, item_set_b)
    return (lcs_len / max(len(string_a), len(item_set_b))) > 0.8

基本上如果lcs至少是字符串长度的80%,我们认为这匹配“足够”。我知道将自定义比较器传递给排序方法就像这样,但我没有在集合操作中找到任何自定义比较器。

2 个答案:

答案 0 :(得分:3)

You can iterate over the cartesian product of both sets, then keep the elements that are in both sets and satisfy your predicate

from itertools import product
{i for i,j in product(set_a, set_b) if i in set_b and match(i,j)}

答案 1 :(得分:0)

Using list compresion and adding a new function:

def match_with_set(string_a, set_b, threshold=0.8):
    for string in set_b:
        if match(string_a, string, threshold):
             return True
    return False

intersection_set = set([ string for string in set_a if match_with_set(string, set_b)])