使用自定义IOB语料库(使用NLTK,train_chunerk.py)训练NER模型

时间:2018-11-23 06:09:50

标签: nltk ner tagged-corpus

我为自定义NER模型制作了自定义IBO语料库。 语料库格式如下。

Liver NNP B-Loc
, , O
needle JJ O
biopsy NN O
: : O
Chronic JJ B-Pathol
hepatitis NN I-Pathol
, , O

[命令]

python train_chunker.py ../iob_corpus --reader nltk.corpus.reader.chunked.ChunkedCorpusReader

[结果]

D:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py:17: Deprecatio
nWarning: Using or importing the ABCs from 'collections' instead of from 'collec
tions.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
D:\Anaconda3\lib\site-packages\sklearn\ensemble\weight_boosting.py:29: Deprecati
onWarning: numpy.core.umath_tests is an internal NumPy module and should not be
imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
loading ../iob_corpus
Traceback (most recent call last):
  File "train_chunker.py", line 145, in <module>
    nchunks = len(chunk_trees)
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 380, in
 __len__
    for tok in self.iterate_from(self._offsets[-1]): pass
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 402, in
 iterate_from
    for tok in piece.iterate_from(max(0, start_tok-offset)):
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 296, in
 iterate_from
    tokens = self.read_block(self._stream)
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\chunked.py", line 179,
 in read_block
    target_tagset=self._target_tagset)
  File "D:\Anaconda3\lib\site-packages\nltk\chunk\util.py", line 355, in tagstr2
tree
    raise ValueError('Expected ] at char {:d}'.format(len(s)))
ValueError: Expected ] at char 8

我不知道错误原因。 语料库的格式有问题吗?

按如下方式制作IOB检查器语料库时发生了类似的错误。

Liver B-Loc
, O
needle O
biopsy O
: O
Focal O
nodular O
hyperplasia O

0 个答案:

没有答案