Question

我刚刚在spaCy中为public static void Main(string[] args) { var host = CreateHostBuilder(args).Build(); var serv = host.Services.GetRequiredService<StoreContext>(); // do some code here host.Run(); }添加了以下扩展名：

Token

因此，我想检查令牌是否具有某个指定的依赖项名称作为其子级之一，因此请执行以下操作：

from spacy.tokens import Token
has_dep = lambda token,name: name in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP', method=has_dep)

输出doc = nlp(u'We are walking around.') walking = doc[2] walking._.HAS_DEP('nsubj')，因为'walking'有一个孩子，其依赖项标签为'nsubj'（即单词'we'）。

但是，我不知道如何在spaCy的Matcher中使用此扩展名。下面是我写的。我期望的输出为True，但似乎不起作用：

walking

Answer 1

我认为您可以通过getter来实现您的目标：

import spacy
from spacy.matcher import Matcher
from spacy.tokens import Token
has_dep = lambda token: 'nsubj' in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP_NSUBJ', getter=has_dep, force=True)

nlp = spacy.load("en_core_web_md")
matcher = Matcher(nlp.vocab)
matcher.add("depnsubj", None, [{"_": {"HAS_DEP_NSUBJ": True}}])

doc = nlp("We're walking around the house.")
matches = matcher(doc)

for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  
    span = doc[start:end]
    print(span)

walking

Answer 2

我认为您可以使用doc.retokenize()和token.head来代替，如下所示：

from spacy.matcher import Matcher
import en_core_web_sm

nlp = en_core_web_sm.load()

matcher = Matcher(nlp.vocab)
pattern = [{'DEP': 'nsubj'}]
matcher.add("depnsubj", None, pattern)

doc = nlp("We're walking around the house.")
matches = matcher(doc)

matched_spans = []
for match_id, start, end in matches:
    span = doc[start:end]
    matched_spans.append(doc[start:end])

matched_tokens = []
with doc.retokenize() as retokenizer:
    for span in spans:
        retokenizer.merge(span)
        for token in span:
            print(token.head)

输出：

walking

在spaCy的Matcher中使用自定义令牌扩展

2 个答案: