Question

我有一条文字：“字母是一家公司。它也是Google的背后。但是这些都不一样。”
根据应用时的spacy匹配器，它返回标签，开始，以匹配结尾。
现在，根据开始和结束，是否可以将范围扩展到整个句子（以句点结尾的单词）？

{

    matches = self.matcher(doc)
            spans = []
            for label, start, end in matches:
                span = Span(doc, start, end, label=label)

}

因此，我期望的输出如下所示。...

实际

Entities[('Alphabet','myORG'),('Google','myORG')]

期待

Entities[('Alphabet is a company','myORG'),('Also it is behind Google','myORG')]

我使用的代码：

{

    from __future__ import unicode_literals, print_function
    import plac
    from spacy.lang.en import English
    from spacy.matcher import PhraseMatcher
    from spacy.tokens import Doc, Span, Token
    def main(text="Alphabet is a company. Also it is behind Google. But these are not the same", *companies):
        nlp = English()
        if not companies: 
            companies = ['Alphabet', 'Google', 'Netflix', 'Apple']
        component = myFindingsMatcher(nlp, companies)
        nlp.add_pipe(component, last=True)
        doc = nlp(text)
        print('Entities', [(e.text, e.label_) for e in doc.ents])  # all orgs are entities
    class myFindingsMatcher(object):
        name = 'myFindings_matcher'
        def __init__(self, nlp, companies=tuple(), label='myORG'):
            patterns = [nlp(finding_type) for finding_type in companies]
            self.matcher = PhraseMatcher(nlp.vocab)
            self.matcher.add(label, None, *patterns)
        def __call__(self, doc):
            matches = self.matcher(doc)
            spans = []
            for label, start, end in matches:
                span = Span(doc, start, end, label=label)
                spans.append(span)
            doc.ents = spans
            return doc
    if __name__ == '__main__':
        plac.call(main)

}

谢谢。

Answer 1

属性e.sent用于引用包含实体e的句子。

带有预编译模型en_core_web_sm及其内置Matcher的最小工作示例：

my_text = "Alphabet is a company. Also it is behind Google. But these are not the same"
nlp = spacy.load('en_core_web_sm')
doc = nlp(my_text)
print('Entities', [(e.text, e.label_, e.sent) for e in doc.ents])

这产生

实体[（'Alphabet'，'ORG'，Alphabet是一家公司。），（'Google'，'ORG'，它也位于Google后面。）]

如果您想使用nlp = English()实现自己的匹配器，则必须添加一个模块来识别句子：

nlp.add_pipe(nlp.create_pipe('sentencizer'))

并且在定义实体范围时，必须确保正确设置e.sent。请注意，通过查看句子的偏移量（计数标记），您可以轻松推断出正确的跨度：

print('Sentences', [(s.start, s.end, s.text) for s in doc.sents])
print('Entities', [(e.start, e.end, e.text, e.label_) for e in doc.ents])

会打印

句子[（0，5，'Alphabet是一家公司。'），（5，11，'也落后于Google。'），（11，17，'但这些都不相同']] < / p>
实体[（0，1，'Alphabet'，'ORG'），（9，10，'Google'，'ORG'）]

Spacy Matcher / PhraseMatcher跨度，如何将跨度扩展到当前句子？

实际

期待

1 个答案: