余弦相似度和句子

时间:2018-10-07 08:07:40

标签: python python-3.x

所以我试图与我拥有的文本文件进行余弦相似度。 https://lms.uwa.edu.au/bbcswebdav/pid-1143173-dt-content-rid-16133365_1/courses/CITS1401_SEM-2_2018/CITS1401_SEM-2_2018_ImportedContent_20180713092326/CITS1401_SEM-1_2018/Unit%20Content/Resources/Project2_2018/sample.txt

我想知道如何逐句打印此语句,而不是逐行打印readline()。 我正在尝试创建句子变量。例如

s1 = "the mississippi is well worth reading about"
s2 = "it is not a commonplace river, but on the contrary is in all ways remarkable"

这是第一种解决方法吗?如果是这样,我下一步要做的就是从句子中删除常用词,只留下唯一的词与之比较。

我如何在句号处停下来,然后将该句子存储到循环遍历文本的变量中?

谢谢

1 个答案:

答案 0 :(得分:1)

这是您的意思吗?

with open("file.txt",'r') as in_f:
  sentences = in_f.read().replace('\n','').split('.')
  for each s in sentences:
     #your code