原句
key_list= ['来自非线性分析和偏微分方程的技术构成了这些研究的基础。','微分方程很酷。' '这不是一个太大的等式']
Spacy Tagging.
[[['techniques', 'NNS'], ['from', 'IN'], ['nonlinear', 'JJ'], ['analysis', 'NN'], ['and', 'CC'], ['partial', 'JJ'], ['differential', 'JJ'], ['equations', 'NNS'], ['form', 'VBP'], ['the', 'DT'], ['basis', 'NN'], ['for', 'IN'], ['these', 'DT'], ['studies', 'NNS'], ['.', '.']],
[['differential', 'JJ'], ['equations', 'NNS'], ['are', 'VBP'], ['cool', 'JJ'], ['.', '.']],
[['it', 'PRP'], ['is', 'VBZ'], ['not', 'RB'], ['too', 'RB'], ['great', 'JJ'], ['of', 'IN'], ['an', 'DT'], ['equation', 'NN']]]
我正在使用 wordnet 使事情变得更容易,但是有没有一种方法可以获得句子的所有名词以及 [RB,RB,JJ] &[JJ,NN] 等标签模式?
>required output.
[['techniques' ,'nonlinear analysis', 'differential equations', 'basis','studies'],['differential equations'],['not too great','equation']]
答案 0 :(得分:1)
如果我正确理解你的问题,你需要这样的东西
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)
text= """techniques from nonlinear analysis and partial
differential equations form the basis for these studies.
Differential equations are cool. It is not too great of an equation"""
doc = nlp(text)
pattern1 = [{"TAG": {"IN": ["NN", "NNS"]}}]
pattern2 = [{"TAG": "RB"},{"TAG": "RB"}, {"TAG": "JJ"}]
matcher.add("matcher", [pattern1, pattern2])
for sent in doc.sents:
matches = matcher(sent)
for match_id, start, end in matches:
print(sent[start:end])