我目前正在尝试计算列表之间的相似度,并在数据框上创建一个显示结果的新列。但是,当我尝试执行此操作时,它将为我的数据的所有条目返回0(如下所示)
list1 list2 similarity
[action, adventure,...] [[zoe_saldana, action,...],..] [0.0, 0.0,...]
... ... ...
这是我正在使用的代码:
def counter_cosine_similarity(c1, c2):
terms = set(c1).union(c2)
dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
try:
return dotprod / (magA * magB)
except ZeroDivisionError:
pass
#SIMILARITY#
def get_similarity (row):
similarities = []
for idx, list_of_lists in enumerate(row['list1']):
for l1 in list_of_lists:
counter_list1=Counter(l1)
counter_list2 = Counter(row['list2'])
similarities.append(counter_cosine_similarity(counter_list1,counter_list2))
return similarities
frame['similarity']=frame.apply(lambda row: get_similarity(row), axis=1)
我一直在试图了解发生了什么,但是我还没有得出结论。尤其是因为当我按照以下方式进行操作时,相似性列表会返回正确的值:
similarity=[]
for idx, list_of_lists in enumerate(frame['list1']):
for l1 in list_of_lists:
counter1=Counter(l1)
for idx1 , l2 in enumerate(frame['list2']):
counter2=Counter(l2)
if idx==idx1:
similarity.append(counter_cosine_similarity(counter1,counter2))
如果有人可以提供帮助,我将非常感谢!
答案 0 :(得分:0)
由于只要将相似性添加到列表中就可以正确地获得相似性,所以有不能这样做的原因:
frame['similarity'] = similarity