Question

我试图使用分数来比较两个文本之间的相似性。这是我的代码：

risk_list1_txt = []
scoreList = []
similarityDict = {}
theScore = 0
for text1 in risk_list1:
    similarityDict['FileName'] = text1
    theText1 = open(path1 + "\\" + text1).read().lower()
    for text2 in range(len(risk_list2)):
        theText2 = open(path2 + "\\" + risk_list2[text2]).read().lower()
        theScore = fuzz.token_set_ratio(theText1,theText2)
        similarityDict[risk_list2[text2]] = theScore
    outFile= open(fileDestDir,'w')
    outFile.write(str(theScore))
outFile.close()

问题是我的outfile只给了我最后一次比较的分数，虽然我在risk_list1和risk_list2中有3个不同的文本文件。我无法让这个循环正常运行。

Answer 1

您正在以写入模式打开文件而不是追加模式。替换

outFile= open(fileDestDir,'w')

带

outFile= open(fileDestDir,'a')

写入模式会截断文件的内容。附加模式附加到现有内容。有关文档模式的更多信息，请参见文档here

Answer 2

看起来可能是缩进问题。

for text1 in risk_list1:
    # iterates through each text1
    # ...

    for text2 in range(len(risk_list2)):
        # iterates through each text2
        theScore = fuzz.token_set_ratio(theText1,theText2)
        # theScore gets set

    # we've iterated all the way through the text2's

    outFile= open(fileDestDir,'w')
    outFile.write(str(theScore))
    # open and write!

正如shaktimaan在他的回答中指出的那样，无论何时打开带有'w'标志的文件，它都会使文件空白。请改为使用'a'附加到文件。

我的for循环只会提供最后的结果，而不是所有结果

2 个答案: