我从here获得了帮助,但问题是我的代码多次打印false
...
我希望代码经过迭代并在相似度大于70时返回一次true / false。
这是我的代码块
def cosine_sim(text1, text2):
text11 = text1
text22 = open(text2, 'r', encoding='utf-8', errors='ignore').read()
tfidf = self.vectorizer.fit_transform([text11, text22])
n = ((tfidf * tfidf.T) * 100).A[0, 1]
return '%.3f%% ' % n
def check():
for path in Path(my_dir).iterdir():
for n in cosine_sim(anytext, path):
if any(float(x) > 70 for x in n):
print("Similarities found !!!...")
break
else:
print("No Similarities...")
break
控制台输出:
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
No Similarities...
Process finished with exit code 0