我是熊猫的新手,我有这两个系列。
train['description_1']
和train['description_2']
是系列。它们每个都包含每行的向量。
from scipy.spatial.distance import cosine
item3 = pd.concat([train['description_1'], train['description_2']], axis = 1)
cos_vec = item3.apply(cosine)
例外是TypeError: ('cosine() takes exactly 2 arguments (1 given)', u'occurred at index description_1')
火车['描述']的每个元素都包含一个向量。
我期待这样的事情
train_1 train_2
[1.0,2.0] [2.0,3.0]
[2.0,2.0] [3.0,2.0]
Output:
cos_sim
x
y
答案 0 :(得分:3)
你需要:
import pandas as pd
from scipy.spatial.distance import cosine
df = pd.DataFrame({'description_1':[0.1,0.32,0.3],
'description_2':[0.4,0.5,0.6]})
print (df)
description_1 description_2
0 0.10 0.4
1 0.32 0.5
2 0.30 0.6
cos_vec = (1 - cosine(df["description_1"], df["description_2"]))
print (cos_vec)
0.962571458085
编辑:
import pandas as pd
from scipy.spatial.distance import cosine
df = pd.DataFrame({'description_1':[[1.0,2.0],[2.0,2.0]],
'description_2':[[2.0,3.0],[3.0,2.0]]})
print (df)
description_1 description_2
0 [1.0, 2.0] [2.0, 3.0]
1 [2.0, 2.0] [3.0, 2.0]
cos_vec = df.apply(lambda x: (1 - cosine(x["description_1"], x["description_2"])), axis=1)
print (cos_vec)
0 0.992278
1 0.980581
dtype: float64