Question

初始化带有术语列表的新PhraseMatcher时出现以下错误：

ValueError：Pattern length（11）＆gt; = phrase_matcher.max_length（10）。长度可以在初始化时设置，最多10个。

patterns = [nlp(org) for org in fields]
        self.matcher = PhraseMatcher(nlp.vocab)
        self.matcher.add('FIELD', None, *patterns)

Answer 1

目前，单个规则的长度不能超过10个令牌：

# Allowed
'one two three four five six seven eight nine ten'
# Not Allowed
'one two three four five six seven eight nine ten eleven'

您可以尝试将限制设置得更高，即：self.matcher = PhraseMatcher(nlp.vocab, max_length=20)，但当前版本的SpaCy 10中的iirc是硬限制。

请参阅https://spacy.io/api/phrasematcher#init上的相关文档和https://github.com/explosion/spacy/blob/master/spacy/matcher.pyx#L452

上的来源

Answer 2

您可以尝试将类定义为实体匹配器并循环遍历各种模式/字段

  class EntityMatcher(object):
       name = 'entity_matcher' 

  def __init__(self, nlp, terms, label): 
      patterns = [nlp(text) for text in terms] 
      self.matcher = PhraseMatcher(nlp.vocab) 
      self.matcher.add(label, None, *patterns) 

  def __call__(self, doc): 
      matches = self.matcher(doc) 
      for match_id, start, end in matches: 
      span = Span(doc, start, end, label = match_id) 
      doc.ents = list(doc.ents) 

  return doc

Answer 3

在spacy 2.1.4版本中，上面的短语匹配器ValueError已解决。如果出现这样的错误，请更新您的spacy版本。参考：github issue link

Spacy PhraseMatcher值错误模式长度（11）＆gt; = phrase_matcher.max_length（10）

3 个答案: