Question

我为自定义NER模型制作了自定义IBO语料库。语料库格式如下。

Liver NNP B-Loc
, , O
needle JJ O
biopsy NN O
: : O
Chronic JJ B-Pathol
hepatitis NN I-Pathol
, , O

[命令]

python train_chunker.py ../iob_corpus --reader nltk.corpus.reader.chunked.ChunkedCorpusReader

[结果]

D:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py:17: Deprecatio
nWarning: Using or importing the ABCs from 'collections' instead of from 'collec
tions.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
D:\Anaconda3\lib\site-packages\sklearn\ensemble\weight_boosting.py:29: Deprecati
onWarning: numpy.core.umath_tests is an internal NumPy module and should not be
imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
loading ../iob_corpus
Traceback (most recent call last):
  File "train_chunker.py", line 145, in <module>
    nchunks = len(chunk_trees)
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 380, in
 __len__
    for tok in self.iterate_from(self._offsets[-1]): pass
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 402, in
 iterate_from
    for tok in piece.iterate_from(max(0, start_tok-offset)):
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 296, in
 iterate_from
    tokens = self.read_block(self._stream)
  File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\chunked.py", line 179,
 in read_block
    target_tagset=self._target_tagset)
  File "D:\Anaconda3\lib\site-packages\nltk\chunk\util.py", line 355, in tagstr2
tree
    raise ValueError('Expected ] at char {:d}'.format(len(s)))
ValueError: Expected ] at char 8

我不知道错误原因。语料库的格式有问题吗？

按如下方式制作IOB检查器语料库时发生了类似的错误。

Liver B-Loc
, O
needle O
biopsy O
: O
Focal O
nodular O
hyperplasia O

使用自定义IOB语料库（使用NLTK，train_chunerk.py）训练NER模型

0 个答案: