嵌套列表中的拼写校正器

时间:2018-12-07 06:50:07

标签: python nlp nltk nested-lists gensim

我有一个嵌套列表:

collection_words = [['cat','doag','tseken'],['phisboal','melk','tsokoleyt'],['eagle','elephant','bare']]

我想更正此嵌套列表中的某些元素,这是我的代码:

from symspellpy.symspellpy import SymSpell, Verbosity

initial_capacity = 83000
max_edit_distance_dictionary = 2
prefix_length = 7
max_edit_distance_lookup = 2
sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)

def correct_spelling(doc):
                    return [[[word.term for word in sym_spell.lookup_compound(words,max_edit_distance_lookup)]for words in texts]for texts in doc]

correct_spelling(collection_words)

但我收到了

TypeError: decoding to str: need a bytes-like object, list found

更长的解决方案是:

for list in collection_words:
        corrected_list = []
        for words in list:
               words =sym_spell.lookup_compound(words,max_edit_distance_lookup)]
               for word in words:
                       corrected_list.append(word.term)

,但这不能提供所需的嵌套列表结构。所需的输出是:

[['cat','dog','chicken'],['fishbowl','milk','chocolate'],['eagle','elephant','bare']]

有任何建议吗?

0 个答案:

没有答案