使用附魔进行拼写检查时忽略某些单词

时间:2017-11-10 18:34:38

标签: python enchant

我拼写使用Python Enchant检查一些文件,并希望它忽略专有名词。它之间的权衡纠正错误拼写专有名词和错误地纠正'它不知道的那些看起来太大了(尽管对此的任何建议也都得到了认可!)

这是我的代码,但目前仍在更正NNP列表中的字词。

chkr = SpellChecker("en_GB")

f = open('ca001_mci_17071971.txt', 'r', encoding = 'utf-8')
text = f.read()
tagged = pos_tag(word_tokenize(text))
NNP = [(word) for word, tag in tagged if tag == 'NNP']
chkr.set_text(text)
for err in chkr:
    if err is word in NNP:
        err.ignore_always()
else:
    sug = err.suggest()[0]
    err.replace(sug)

corrected = chkr.get_text()
print (NNP)
print (corrected) 

输出:

['Boojum', 'Boy', 'Charlotte']

Those blessed days of summer are here when you just need to wear shirt,       
trousers, & sandals all day, & the kiddies run naked in & out of the garden. 
Both the Boomer Boy & Charlotte are getting quite tanned. We're having a long
spell of time settled weather.

可以看到' Boojum'已被更正为Boomer,即使它已在NNP名单中。

有人能指出我正确的方向吗?我对Python很新。提前谢谢。

1 个答案:

答案 0 :(得分:1)

我想出来了。不得不告诉它错误的单词是stings,以便它可以将它们与NNP列表中的单词进行比较。新代码:

chkr = SpellChecker("en_GB")

for file in os.listdir(path):       
        f = open(file, 'r', encoding = 'utf-8')
        text = f.read()
        tagged = pos_tag(word_tokenize(text))
        NNP = [word for word, tag in tagged if tag == 'NNP']
        chkr.set_text(text)
        for err in chkr:
            if str(err.word) in NNP:
                err.ignore_always()
            else:
                sug = chkr.suggest()
                if len(sug) is not 0:
                    err.replace(sug[0])

        corrected = chkr.get_text()

同样更正,如果附魔没有任何建议,则会留下错误。