我有一组字符串
set_a = {'abcd', 'efgh', 'ghij'}
set_b = {'abce', 'efgk', 'ghij'}
我想找到这两个集合之间的交集,但元素相等定义如下
def match(string_a, string_b, threshold=0.8):
lcs_len = lcs(string_a, item_set_b)
return (lcs_len / max(len(string_a), len(item_set_b))) > 0.8
基本上如果lcs至少是字符串长度的80%,我们认为这匹配“足够”。我知道将自定义比较器传递给排序方法就像这样,但我没有在集合操作中找到任何自定义比较器。
答案 0 :(得分:3)
You can iterate over the cartesian product of both sets, then keep the elements that are in both sets and satisfy your predicate
from itertools import product
{i for i,j in product(set_a, set_b) if i in set_b and match(i,j)}
答案 1 :(得分:0)
Using list compresion and adding a new function:
def match_with_set(string_a, set_b, threshold=0.8):
for string in set_b:
if match(string_a, string, threshold):
return True
return False
intersection_set = set([ string for string in set_a if match_with_set(string, set_b)])