我刚刚在spaCy中为 public static void Main(string[] args)
{
var host = CreateHostBuilder(args).Build();
var serv = host.Services.GetRequiredService<StoreContext>();
// do some code here
host.Run();
}
添加了以下扩展名:
Token
因此,我想检查令牌是否具有某个指定的依赖项名称作为其子级之一,因此请执行以下操作:
from spacy.tokens import Token
has_dep = lambda token,name: name in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP', method=has_dep)
输出doc = nlp(u'We are walking around.')
walking = doc[2]
walking._.HAS_DEP('nsubj')
,因为'walking'有一个孩子,其依赖项标签为'nsubj'(即单词'we')。
但是,我不知道如何在spaCy的Matcher中使用此扩展名。下面是我写的。我期望的输出为True
,但似乎不起作用:
walking
答案 0 :(得分:1)
我认为您可以通过getter
来实现您的目标:
import spacy
from spacy.matcher import Matcher
from spacy.tokens import Token
has_dep = lambda token: 'nsubj' in [child.dep_ for child in token.children]
Token.set_extension('HAS_DEP_NSUBJ', getter=has_dep, force=True)
nlp = spacy.load("en_core_web_md")
matcher = Matcher(nlp.vocab)
matcher.add("depnsubj", None, [{"_": {"HAS_DEP_NSUBJ": True}}])
doc = nlp("We're walking around the house.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(span)
walking
答案 1 :(得分:0)
我认为您可以使用doc.retokenize()
和token.head
来代替,如下所示:
from spacy.matcher import Matcher
import en_core_web_sm
nlp = en_core_web_sm.load()
matcher = Matcher(nlp.vocab)
pattern = [{'DEP': 'nsubj'}]
matcher.add("depnsubj", None, pattern)
doc = nlp("We're walking around the house.")
matches = matcher(doc)
matched_spans = []
for match_id, start, end in matches:
span = doc[start:end]
matched_spans.append(doc[start:end])
matched_tokens = []
with doc.retokenize() as retokenizer:
for span in spans:
retokenizer.merge(span)
for token in span:
print(token.head)
输出:
walking