Question

我正在构建基于彼得诺威格代码的拼写校正算法。添加我自己的案例导致效率提高（因为我没有像big.txt这样的参考文本，文本主要包含非英语单词）。
我注意到了＆＃34; golden＆＃34;被自动修正为＆＃34; garden＆＃34; （因为花园在文本中出现的频率较高）。所以我决定在文本中添加bigram搭配。这将使黄金在与通常出现的某些词组合出现时保持黄金。我正在实施它，我需要一些帮助。这是代码的一部分：

import nltk
from nltk.collocations import *
bigram_measures = nltk.collocations.BigramAssocMeasures()
finder = BigramCollocationFinder.from_words(tokenized) #create bigram pairs in text called tokenized
finder.apply_freq_filter(3) #bigrams which occur less than 3 times will not be considered.

我要做的下一件事是添加一个例外，如果它出现在＆＃34; finder＆＃34;那个词不应该被替换掉。我怎么能用它。？
（如果问题陈述不清楚，我会根据要求编辑它）

用于拼写校正的Bigram Collocations，Python

0 个答案: