例如,我试图匹配两个关键字,中间最多五个通配符。我可以添加五个具有不同通配符数量的模式,但这不是一个好的解决方案。是否有类似{"OP": "+5"}
的选项或其他解决方案?
示例:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a really nice, green apple. One apple a day ...!")
matcher = Matcher(nlp.vocab)
pattern = [{'ORTH': 'is'}, {"OP": "+"}, {"ORTH": "apple"} ]
matcher.add('test', None, pattern)
spans = [doc[start:end] for match_id, start, end in matcher(doc)]
for span in spans:
print(spans)
这给出了两个匹配项:
is a really nice, green apple
和is a really nice, green apple. One apple
但是我只想要第一个。它应该可以正常工作,因此拆分句子等不是解决方案。
答案 0 :(得分:0)
您可以执行以下操作:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a really nice, green apple. One apple a day ...!")
matcher = Matcher(nlp.vocab)
pattern = [{'ORTH': 'is'}]
for i in range(0,5):
pattern.append({"OP": "?"})
pattern.append({"ORTH": "apple"})
matcher.add('test', None, pattern)
spans = [doc[start:end] for match_id, start, end in matcher(doc)]
for span in spans:
print(spans)
# [is a really nice, green apple]