Jaccard Index返回全部1

时间:2016-04-12 22:53:44

标签: python

def jaccard_distance(x,y):
  intersection_cardinality = len(set.intersection(*[set(x), set(y)]))
  union_cardinality = len(set.union(*[set(x), set(y)]))
  return intersection_cardinality/float(union_cardinality)

出于某种原因,当我使用我的数据矩阵运行它时,它将返回所有1。有谁知道我做错了什么?

1 个答案:

答案 0 :(得分:1)

此代码似乎有效:

def jaccard_index(x,y):
  intersection_cardinality = len(set.intersection(*[set(x), set(y)]))
  union_cardinality = len(set.union(*[set(x), set(y)]))
  return intersection_cardinality/float(union_cardinality)


jaccard_distance((1, 2, 3), (3, 4, 5))   # return 0.2
jaccard_distance((1, 2, 3), (6, 4, 5))   # return 0.0
jaccard_distance((1, 2, 3), (1, 2, 3))   # return 1.0

关于代码

这个表达式:

set.intersection(*[set(x), set(y)])

非常复杂,可以简化为:

set(x) & set(y)  # or set(x).intersection(set(y))

在Python 3下,不需要浮动渲染:

>>> 1 / 3
0.333333333333

初始列表的创建很复杂,可能会导致错误。 这些应该有效地取代它们:

numbers = []
with open('jarticl.csv','rt', errors='replace') as f:
    reader = csv.reader(f)
    for row in reader = csv.reader(f):
         numbers.append(tuple(row[i] for i in range(1, 6)))

art1, art2, art3, art4, art5 = zip(*numbers)