列表之间的余弦相似度未计算

时间:2019-08-11 18:20:11

标签: python apply cosine-similarity

我目前正在尝试计算列表之间的相似度,并在数据框上创建一个显示结果的新列。但是,当我尝试执行此操作时,它将为我的数据的所有条目返回0(如下所示)


        list1                   list2                   similarity
[action, adventure,...]   [[zoe_saldana, action,...],..]    [0.0, 0.0,...]
         ...                     ...                       ...

这是我正在使用的代码:

def counter_cosine_similarity(c1, c2):
    terms = set(c1).union(c2)
    dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
    magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
    magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
    try:
        return dotprod / (magA * magB)
    except ZeroDivisionError:
        pass


#SIMILARITY#
def get_similarity (row):
        similarities = []
        for idx, list_of_lists in enumerate(row['list1']):
                for l1 in list_of_lists:
                        counter_list1=Counter(l1)
                        counter_list2 = Counter(row['list2'])
                        similarities.append(counter_cosine_similarity(counter_list1,counter_list2))

        return similarities

frame['similarity']=frame.apply(lambda row: get_similarity(row), axis=1)

我一直在试图了解发生了什么,但是我还没有得出结论。尤其是因为当我按照以下方式进行操作时,相似性列表会返回正确的值:

similarity=[]
for idx, list_of_lists in enumerate(frame['list1']):
        for l1 in list_of_lists:
                counter1=Counter(l1)
                for idx1 , l2 in enumerate(frame['list2']):
                        counter2=Counter(l2)
                        if idx==idx1:
                        similarity.append(counter_cosine_similarity(counter1,counter2))


如果有人可以提供帮助,我将非常感谢!

1 个答案:

答案 0 :(得分:0)

由于只要将相似性添加到列表中就可以正确地获得相似性,所以有不能这样做的原因:

frame['similarity'] = similarity