def jaccard_distance(x,y):
intersection_cardinality = len(set.intersection(*[set(x), set(y)]))
union_cardinality = len(set.union(*[set(x), set(y)]))
return intersection_cardinality/float(union_cardinality)
出于某种原因,当我使用我的数据矩阵运行它时,它将返回所有1。有谁知道我做错了什么?
答案 0 :(得分:1)
此代码似乎有效:
def jaccard_index(x,y):
intersection_cardinality = len(set.intersection(*[set(x), set(y)]))
union_cardinality = len(set.union(*[set(x), set(y)]))
return intersection_cardinality/float(union_cardinality)
jaccard_distance((1, 2, 3), (3, 4, 5)) # return 0.2
jaccard_distance((1, 2, 3), (6, 4, 5)) # return 0.0
jaccard_distance((1, 2, 3), (1, 2, 3)) # return 1.0
这个表达式:
set.intersection(*[set(x), set(y)])
非常复杂,可以简化为:
set(x) & set(y) # or set(x).intersection(set(y))
在Python 3下,不需要浮动渲染:
>>> 1 / 3
0.333333333333
初始列表的创建很复杂,可能会导致错误。 这些应该有效地取代它们:
numbers = []
with open('jarticl.csv','rt', errors='replace') as f:
reader = csv.reader(f)
for row in reader = csv.reader(f):
numbers.append(tuple(row[i] for i in range(1, 6)))
art1, art2, art3, art4, art5 = zip(*numbers)