我为自定义NER模型制作了自定义IBO语料库。 语料库格式如下。
Liver NNP B-Loc
, , O
needle JJ O
biopsy NN O
: : O
Chronic JJ B-Pathol
hepatitis NN I-Pathol
, , O
[命令]
python train_chunker.py ../iob_corpus --reader nltk.corpus.reader.chunked.ChunkedCorpusReader
[结果]
D:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py:17: Deprecatio
nWarning: Using or importing the ABCs from 'collections' instead of from 'collec
tions.abc' is deprecated, and in 3.8 it will stop working
from collections import Mapping, defaultdict
D:\Anaconda3\lib\site-packages\sklearn\ensemble\weight_boosting.py:29: Deprecati
onWarning: numpy.core.umath_tests is an internal NumPy module and should not be
imported. It will be removed in a future NumPy release.
from numpy.core.umath_tests import inner1d
loading ../iob_corpus
Traceback (most recent call last):
File "train_chunker.py", line 145, in <module>
nchunks = len(chunk_trees)
File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 380, in
__len__
for tok in self.iterate_from(self._offsets[-1]): pass
File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 402, in
iterate_from
for tok in piece.iterate_from(max(0, start_tok-offset)):
File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 296, in
iterate_from
tokens = self.read_block(self._stream)
File "D:\Anaconda3\lib\site-packages\nltk\corpus\reader\chunked.py", line 179,
in read_block
target_tagset=self._target_tagset)
File "D:\Anaconda3\lib\site-packages\nltk\chunk\util.py", line 355, in tagstr2
tree
raise ValueError('Expected ] at char {:d}'.format(len(s)))
ValueError: Expected ] at char 8
我不知道错误原因。 语料库的格式有问题吗?
按如下方式制作IOB检查器语料库时发生了类似的错误。
Liver B-Loc
, O
needle O
biopsy O
: O
Focal O
nodular O
hyperplasia O