Spacy短语匹配器未获取匹配器名称

时间:2018-08-24 04:14:14

标签: spacy

我是phraseMatcher的新手,想从我的电子邮件中提取一些关键字。
一切工作正常,除了我无法获得添加的匹配器的名称。

这是我的下面的代码:

def main():
    patterns_months = 'phraseMatcher/months.txt'
    text_loc = 'phraseMatcher/text.txt'
    nlp = spacy.blank('en')
    nlp.vocab.lex_attr_getters ={}
    phrases_months = read_gazetter(patterns_months)
    txts = read_text(text_loc, n=n)

    months = [nlp(text) for text in phrases_months]

    matcher = PhraseMatcher(nlp.vocab)
    matcher.add('MONTHS', None, *months)
    print(nlp.vocab.strings['MONTHS'])

    for txt in txts:
        doc = nlp(txt)
        matches = matcher(doc)

        for match_id ,start, end in matches:
            span = doc[start: end]
            label = nlp.vocab.strings[match_id]
            print(label, span.text, start, end)

结果:

12298211501233906429          <--- this is from print(nlp.vocab.strings['MONTHS'])
Traceback (most recent call last):
  File "D:/workspace/phraseMatcher/venv/phraseMatcher.py", line 71, in <module>
    plac.call(main)
  File "D:\workspace\phraseMatcher\venv\lib\site-packages\plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "D:\workspace\phraseMatcher\venv\lib\site-packages\plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "D:/workspace/phraseMatcher/venv/phraseMatcher.py", line 47, in main
    label = nlp.vocab.strings[match_id]
  File "strings.pyx", line 117, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '18446744072093410045'."
  • spaCy版本:** 2.0.12
  • 平台:** Windows-7-6.1.7601-SP1
  • Python版本:** 3.7.0

我找不到我做错了什么。很简单,我已经读过这些内容:
Using PhraseMatcher in SpaCy to find multiple match types

请帮助我,谢谢。

0 个答案:

没有答案