我刚刚完成了我的第一个余弦相似度代码,并成功地对结果进行了编码。但是,我想以更具体的方式显示结果。有什么办法可以看出37.8%是如何计算的?当然,很酷也可以是图形或类似的东西。
这是我的代码:
f=open (r"C:\Users\Output11.txt")
doc1=str(f.read())
f1=open(r"C:\Users\Output22.txt")
doc2=str(f1.read())
def cosine_distance(s1, s2):
allsentences=[doc1 , doc2]
from sklearn.feature_extraction.text import CountVectorizer
from scipy.spatial import distance
vectorizer=CountVectorizer()
all_sentences_to_vector = vectorizer.fit_transform(allsentences)
text_to_vector_v1 = all_sentences_to_vector.toarray()[0].tolist()
text_to_vector_v2 = all_sentences_to_vector.toarray()[1].tolist()
cosine = distance.cosine(text_to_vector_v1, text_to_vector_v2)
print('Similarity of two sentences are equal to ',round((1-cosine)*100,2),'%')
return cosine
cosine_distance(doc1 , doc2)
结果如我所说:
Similarity of two sentences are equal to 37.8 %