Question

我认为我可能会对python或nltk中的某些内容感到根本困惑。我正在从论文摘要中生成令牌列表，并尝试查看令牌中是否包含搜索词。我确实了解一致性，但是它不能很好地用于比较的预期用途。

这是我的代码：

def tokenize(text):
    tokens = nltk.word_tokenize(text.get_text())
    return tokens

def search_abstract_single_word(tokens, keyword):
    match = 0
    for token in tokens:
        if token == keyword:
            match += 1
    return match

def search_file_single_word(abstract_list, keyword):
    matches = list()
    for item in abstract_list:
        tokens = tokenize(item)
        match = search_abstract_single_word(tokens, keyword)
        matches.append(match)
    return matches

我已经确认传入的标记和关键字是正确的，但是match（以及整个匹配项列表）的总值为零。我当时的理解是word_tokenize返回一个字符串数组，所以我不明白为什么，例如，当令牌= computer和关键字= computer时，令牌==关键字不返回true和增量匹配。

编辑：在独立的类/主方法中，此代码确实起作用。但是，代码是从tkinter窗口中调用的，如下所示：

self.keyword = ""
....
self.keywords_box = Text(self.Frame2)
....
self.Submit = Button(master)
self.Submit.configure(command=self.submit)
....
#triggered by submit button
def submit(self):
    self.keywords += self.keywords_box.get("1.0", END)

#triggered by run button after keyword saved
def run(self):
    search_input = self.keywords
    ....
    #use pandas to read excel file, create abstracts, and store
    ....
    matches = search_file_single_word(abstract_list, search_input)
    for match in matches:
        self.output_box.insert(END, match)
        self.output_box.insert(END, '\n')

我曾经假设，因为如果将print（keyword）插入search_file_single_word中，则print（keyword）会正确输出，该值已正确传递，但实际上它只是传递了tkinter属性，而拒绝通过令牌来评估它吗？

Answer 1

故事的寓意，请谨慎选择。使用textbox.get（“ 1.0”，END）将插入换行符。字符串！=字符串\ n。在this post

的答案中找到了解决方案

nltk比较令牌（==当“ true”时返回false）

1 个答案: