我有如下输入:
list1 = [['Search','engines','using','machine','learning','pattern','detections'],['machine','learning','helped','Google','automatically','sift','pages']]
list2 = ['Machine','learning','ever','evolving','technology']
我尝试了以下代码:
def jaccard_similarity(list1, list2):
intersection = len(list(set(list1).intersection(list2)))
print(list(set(list1).intersection(list2)))
union = (len(list1) + len(list2)) - intersection
return float(intersection / union)
jaccard_similarity(input_list,input_list1)
得到以下错误:
TypeError: unhashable type: 'list'
答案 0 :(得分:2)
我相信您想要做的是为jaccard_similarity
中的每个列表获取list1
。如果是这样,请遍历它们。另外,对jaccard_similarity
中的行进行了小的更正。
list1=[
['Search','engines','using','machine','learning','pattern','detections'],
['machine','learning','helped','Google','automatically','sift','pages']
]
list2 = ['Machine','learning','ever','evolving','technology']
def jaccard_similarity(list1, list2):
intersection = len(set(list1).intersection(list2)) #no need to call list here
union = len(list1 + list2) - intersection #you only need to call len once here
return intersection / union #also no need to cast to float as this will be done for you
for l in list1:
print(jaccard_similarity(l, list2))
或者理解
similarities = [jaccard_similarity(l, list2) for l in list1]
编辑,这是获取jaccard_similarity
顺便说一句的简单得多的方法:
def jaccard_similarity(list1, list2):
s1, s2 = set(list1), set(list2)
return len(s1 & s2) / len(s1 | s2)
答案 1 :(得分:1)
您可以使用函数来计算两个列表之间的Jaccard索引:
jaccard_similarity(list1[0], list2)
返回:
['learning']
Out[7]: 0.09090909090909091
您还可以使用循环将功能应用于list1中的不同子列表,并获取list1和list2子列表之间的Jaccard索引。