根据标签模式和其他标签过滤 pos-tag 结果

时间:2021-03-04 15:11:48

标签: python spacy

原句

key_list= ['来自非线性分析和偏微分方程的技术构成了这些研究的基础。','微分方程很酷。' '这不是一个太大的等式']

    Spacy Tagging.
[[['techniques', 'NNS'], ['from', 'IN'], ['nonlinear', 'JJ'], ['analysis', 'NN'], ['and', 'CC'], ['partial', 'JJ'], ['differential', 'JJ'], ['equations', 'NNS'], ['form', 'VBP'], ['the', 'DT'], ['basis', 'NN'], ['for', 'IN'], ['these', 'DT'], ['studies', 'NNS'], ['.', '.']],
[['differential', 'JJ'], ['equations', 'NNS'], ['are', 'VBP'], ['cool', 'JJ'], ['.', '.']], 
[['it', 'PRP'], ['is', 'VBZ'], ['not', 'RB'], ['too', 'RB'], ['great', 'JJ'], ['of', 'IN'], ['an', 'DT'], ['equation', 'NN']]]

我正在使用 wordnet 使事情变得更容易,但是有没有一种方法可以获得句子的所有名词以及 [RB,RB,JJ] &[JJ,NN] 等标签模式?

>
required output.
[['techniques' ,'nonlinear analysis', 'differential equations', 'basis','studies'],['differential equations'],['not too great','equation']]

1 个答案:

答案 0 :(得分:1)

如果我正确理解你的问题,你需要这样的东西

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)

text= """techniques from nonlinear analysis and partial
     differential equations form the basis for these studies. 
     Differential equations are cool. It is not too great of an equation"""
doc = nlp(text)

pattern1 = [{"TAG": {"IN": ["NN", "NNS"]}}]
pattern2 = [{"TAG": "RB"},{"TAG": "RB"}, {"TAG": "JJ"}]

matcher.add("matcher", [pattern1, pattern2])

for sent in doc.sents:
    matches = matcher(sent)
    for match_id, start, end in matches:
        print(sent[start:end])