文字相似之处。余弦相似度。指定结果

时间:2019-10-23 18:56:15

标签: python

我刚刚完成了我的第一个余弦相似度代码,并成功地对结果进行了编码。但是,我想以更具体的方式显示结果。有什么办法可以看出37.8%是如何计算的?当然,很酷也可以是图形或类似的东西。

这是我的代码:

f=open (r"C:\Users\Output11.txt")
doc1=str(f.read())
f1=open(r"C:\Users\Output22.txt")
doc2=str(f1.read())
def cosine_distance(s1, s2):
    allsentences=[doc1 , doc2]

    from sklearn.feature_extraction.text import CountVectorizer
    from scipy.spatial import distance

    vectorizer=CountVectorizer()
    all_sentences_to_vector = vectorizer.fit_transform(allsentences)
    text_to_vector_v1 = all_sentences_to_vector.toarray()[0].tolist()
    text_to_vector_v2 = all_sentences_to_vector.toarray()[1].tolist()
    cosine = distance.cosine(text_to_vector_v1, text_to_vector_v2)
    print('Similarity of two sentences are equal to ',round((1-cosine)*100,2),'%')
    return cosine
cosine_distance(doc1 , doc2)

结果如我所说:

Similarity of two sentences are equal to  37.8 %

0 个答案:

没有答案