基于spacy基于规则的匹配是否可以将两个关键字之间最多匹配一定数量的通配符进行匹配?

时间:2019-11-22 14:59:35

标签: nlp pattern-matching spacy

例如,我试图匹配两个关键字,中间最多五个通配符。我可以添加五个具有不同通配符数量的模式,但这不是一个好的解决方案。是否有类似{"OP": "+5"}的选项或其他解决方案?

示例:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a really nice, green apple. One apple a day ...!")
matcher = Matcher(nlp.vocab)
pattern = [{'ORTH': 'is'}, {"OP": "+"}, {"ORTH": "apple"} ]
matcher.add('test', None, pattern)
spans = [doc[start:end] for match_id, start, end in matcher(doc)]
for span in spans:
    print(spans)

这给出了两个匹配项:

is a really nice, green appleis a really nice, green apple. One apple

但是我只想要第一个。它应该可以正常工作,因此拆分句子等不是解决方案。

1 个答案:

答案 0 :(得分:0)

您可以执行以下操作:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a really nice, green apple. One apple a day ...!")
matcher = Matcher(nlp.vocab)

pattern = [{'ORTH': 'is'}]
for i in range(0,5):
    pattern.append({"OP": "?"}) 
pattern.append({"ORTH": "apple"})

matcher.add('test', None, pattern)
spans = [doc[start:end] for match_id, start, end in matcher(doc)]
for span in spans:
    print(spans)

# [is a really nice, green apple]