我正在尝试在python3(64位)中实现symspell,我有一个20 MB的txt文件,其中包含带有频率的单词。我可以成功地将数据加载到名为originalDictionary
的字典中。对于字典中每个单词的下一步,我应该一次删除一个字符,并将修改后的单词添加到另一个名为editDictionary
的字典中。但是我遇到了内存错误。
我正在具有16GB RAM的Windows10(x64)上运行此程序。
我该怎么解决这个问题?
for word in originalDictionary:
for i in range(len(word)):
edit1 = word[0:i] + word[i + 1:]
if edit1 not in editedDictionary:
editedDictionary[edit1] = [word]
else:
editedDictionary[edit1].append(word)
以下是错误:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\pydevd.py", line 1741, in <module>
main()
File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2018.3.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/ee/PycharmProjects/SymSpell/spellCorrector.py", line 98, in <module>
createDictionaries()
File "C:/Users/ee/PycharmProjects/SymSpell/spellCorrector.py", line 40, in createDictionaries
editedDictionary[edit1] = [word]
MemoryError
答案 0 :(得分:0)
在最新版本的SymSpell算法中,您可以定义前缀长度。仅在此前缀内生成删除。较短的前缀长度会显着减少内存消耗,但以较慢的查找时间为代价。前缀长度= 5通常是一个不错的选择。
有一个可用的SymSpell Python端口,它支持设置前缀长度: https://github.com/mammothb/symspellpy