Question

我想在多线程环境中访问nltk.corpus.wordnet。一旦启用多线程，诸如synsets()之类的方法就会失败。如果禁用它，一切正常。

错误消息更改。例如，错误可能看起来像这样，对我来说这很像一个竞争条件：

File "/home/lhk/anaconda3/envs/dlab/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1342, in synset_from_pos_and_offset
    assert synset._offset == offset

还有其他问题：

这里的问题也是由多线程引起的：What would cause WordNetCorpusReader to have no attribute LazyCorpusLoader?
此问题的标题更笼统，但似乎描述了相同的问题（多线程语料库加载失败）：Python NLTK multi threading
与此有关的一个问题：https://github.com/nltk/nltk/issues/1576

第一个链接问题的解决方案是在程序分支到各个线程之前加载语料库。我已经做到了：wordnet.ensure_loaded()在多线程之前被调用。

github问题中的建议是在我的线程函数中导入wordnet。但这并没有任何改变。

Answer 1

一种解决方法是为每个线程制作主体的深层副本。当然，这需要大量内存，而且效率不是很高：

import copy
from nltk.corpus import wordnet as wn
wn.ensure_loaded()

# at the beginning of the multi-threaded environment
my_wn = copy.deepcopy(wn)

在多线程中使用nltk.corpus

1 个答案: