Question

我是NLP和Python的新手。我正在尝试使用对象标准化来用其全部含义代替缩写。我在网上找到了代码，并对其进行了更改，以在Wikipedia应用程序上对其进行测试。但是所有代码所做的只是打印出原始文本。任何人都可以帮助需要帮助的新手吗？

此处是代码：

import nltk

lookup_dict = {'EC': 'European Commission', 'EU': 'European Union', "ECSC": "European Coal and Steel Commuinty",
               "EEC": "European Economic Community"}


def _lookup_words(input_text):
    words = input_text.split()
    new_words = []
    for word in words:
        if word.lower() in lookup_dict:
            word = lookup_dict[word.lower()]
        new_words.append(word)
        new_text = " ".join(new_words)


    print(new_text)
    return new_text


_lookup_words(
    "The High Authority was the supranational administrative executive of the new European Coal and Steel Community ECSC. It took office first on 10 August 1952 in Luxembourg. In 1958, the Treaties of Rome had established two new communities alongside the ECSC: the eec and the European Atomic Energy Community (Euratom). However their executives were called Commissions rather than High Authorities")

在此先感谢您的帮助！

Answer 1

在您的情况下，在您的输入句子中找到的单词中，查询字典具有EC和ECSC的缩写。调用split会根据空格分割输入。但是您的句子中有ECSC.和ECSC:这两个词，也就是说，这些词是拆分后获得的标记，而不是ECSC，因此您无法映射输入。我建议进行一些删除操作，然后再次运行。

使用NLTK的对象标准化

1 个答案: