我试图用Pickle将NLMK的HMM Tagger保存如下。但它给我的错误如下, 请给我一个解决方案。
>>> import nltk
>>> import pickle
>>> brown_a = nltk.corpus.brown.tagged_sents()[:300]
>>> hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
>>> sent = nltk.corpus.brown.sents()[400]
>>> hmm_tagger.tag(sent)
[(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
>>> f = open('my_tagger.pickle', 'wb')
>>> pickle.dump(hmm_tagger, f)
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
pickle.dump(hmm_tagger, f)
File "C:\Python27\lib\pickle.py", line 1376, in dump
Pickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 754, in save_global
(obj, module, name))
PicklingError: Can't pickle <function estimator at 0x0575F6F0>: it's not found as nltk.tag.hmm.estimator
>>>
我在MS-Windows10上使用NLTK3.1进行Python2.7.11。
先谢谢。
答案 0 :(得分:0)
你为什么要腌制模特?对棕色语料库的训练非常快。如果你想要一个更好的词性标注器,考虑查看易于使用的https://spacy.io/,它具有很好的酸洗支持并产生最先进的结果。实际上,HMM标签现在非常糟糕。
无论如何,这是一个NLTK错误。三个选项:
nltk.tag.hmm.estimator
中找到它以下是使用dill转储标记器的方法:
import nltk
import dill
brown_a = nltk.corpus.brown.tagged_sents()[:300]
hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
sent = nltk.corpus.brown.sents()[400]
hmm_tagger.tag(sent)
# [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
with open('my_tagger.dill', 'wb') as f:
dill.dump(hmm_tagger, f)
现在您可以加载标记器:
import dill
with open('my_tagger.dill', 'rb') as f:
hmm_tagger = dill.load(f)
hmm_tagger.tag(sent)
# [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]