在数据框df
中,我有以下列tf-idf
:
tf-idf
0 {u'selection': 3.83579393163, u'carltons': 7.0...
1 {u'precise': 6.43261849762, u'thomas': 3.31980...
2 {u'just': 2.70047792082, u'issued': 4.42829758...
3 {u'englishreading': 9.88788310056, u'all': 1.6...
4 {u'they': 1.89922701484, u'gangstergenka': 10....
5 {u'since': 1.45530416153, u'less': 3.956522477...
6 {u'exclusive': 10.4488880129, u'producer': 2.6...
7 {u'taxi': 6.04485296662, u'all': 1.64302370465...
8 {u'houston': 3.93463976627, u'frankie': 6.0306...
9 {u'phenomenon': 5.74474837417, u'deborash': 10...
10 {u'zwigoff': 19.7757662011, u'september': 1.90...
11 {u'gospels': 7.9419729515, u'theft': 6.0028887... `
我很难在两个样本之间找到cosine similarity
- 例如在df['tf-idf'][0]
和df['tf-idf'][1]
之间。
答案 0 :(得分:2)
你可以使用scikit-learn:
from sklearn.feature_extraction import DictVectorizer
from sklearn.metrics.pairwise import cosine_similarity
a = DictVectorizer().fit_transform(df['tf-idf'])
cosine_similarity(a[0], a[1])