Python Spacy错误:RuntimeError:不支持语言

时间:2017-05-25 09:20:59

标签: python dictionary entity spacy

我将向自己的spacy数据模型“mymodel”添加新实体。在我使用此tutorial安装“mymodel”之前,它运行良好。当我想使用“mymodel”添加新实体时,我有一个误解。拜托,帮帮我

这是我的代码:

import plac

from spacy.en import English
from spacy.gold import GoldParse
import spacy
nlp = spacy.load('mymodel')

def main(out_loc):
    nlp = English(parser=False) # Avoid loading the parser, for quick load times
    # Run the tokenizer and tagger (but not the entity recognizer)
    doc = nlp.tokenizer(u'Lions and tigers and grizzly bears!')
    nlp.tagger(doc) 

    nlp.entity.add_label('ANIMAL') # <-- New in v0.100

    # Create a GoldParse object. This should have a better API...
    indices = tuple(range(len(doc)))
    words = [w.text for w in doc]
    tags = [w.tag_ for w in doc]
    heads = [0 for _ in doc]
    deps = ['' for _ in doc]
    # This is the only part we care about. We want BILOU format
    ner = ['U-ANIMAL', 'O', 'U-ANIMAL', 'O', 'B-ANIMAL', 'L-ANIMAL', 'O']

    # Create the GoldParse
    annot = GoldParse(doc, (indices, words, tags, heads, deps, ner))

    # Update the weights with the example
    # Here we iterate until we get it entirely correct. In practice this is probably a bad idea.
    # Note that we've added a class to the existing model here! We "resume"
    # training the previous model. Whether this is good or not I can't say, you'll have to
    # experiment.
    loss = nlp.entity.train(doc, annot)
    i = 0
    while loss != 0 and i < 1000:
        loss = nlp.entity.train(doc, annot)
        i += 1
    print("Used %d iterations" % i)

    nlp.entity(doc)
    for ent in doc.ents:
        print(ent.text, ent.label_)
    nlp.entity.model.dump(out_loc)

if __name__ == '__main__':
    plac.call(main)

**Error of output:**

Traceback (most recent call last):
  File "/home/vv/webapp/dic_model.py", line 7, in <module>
    nlp = spacy.load('mymodel')
  File "/usr/local/lib/python3.5/dist-packages/spacy/__init__.py", line 26, in load
    lang_name = util.get_lang_class(name).lang
  File "/usr/local/lib/python3.5/dist-packages/spacy/util.py", line 27, in get_lang_class
    raise RuntimeError('Language not supported: %s' % name)
RuntimeError: Language not supported: mymodel

1 个答案:

答案 0 :(得分:2)

此处的问题是spacy.load()目前需要语言ID(例如'en')或shortcut link来告诉spaCy在何处查找数据。由于spaCy无法找到快捷方式链接,因此它假设'my_model'是一种语言,显然不存在。

您可以为此模型设置链接:

python -m spacy link my_model my_model # if it's installed via pip, or:
python -m spacy link /path/to/my_model/data my_model

这将在/spacy/data目录中创建一个符号链接,因此您应该使用管理员权限运行它。

或者,如果您已经创建了可以通过pip安装的model package,则只需安装并导入它,然后调用其load()方法,不带参数:

import my_model
nlp = my_model.load()

在某些情况下,这种加载模型的方式实际上更方便,因为它更清晰,让您可以更轻松地调试代码。例如,如果模型不存在,Python将立即引发ImportError。同样,如果加载失败,您知道该模型自身的加载和元数可能存在问题。

顺便说一下:我是spaCy维护者之一,我同意spacy.load()目前的工作方式绝对是不理智和令人困惑的。我们期待在下一个主要版本中最终改变这一点。我们非常接近发布v2.0的第一个alpha版本,这将更加优雅地解决这个问题,并且还将对培训流程和文档进行大量改进。