指定不匹配正则表达式中的内容

时间:2017-04-22 15:25:22

标签: c# regex

我的输入有嵌套的括号,节点内有类似的节点。样本数据:

(S (S (NP (PRP It)) (VP (VP (VBZ has) (VP (VBN been) (PP (IN over) (NP (NN half))) (NP (DT a) (NN year)) (SBAR (IN since) (S (NP (DT a) (ADJP (CD 19.5) (NN %)) (NN tax)) (VP (VBD was) (VP (VBN imposed) (PP (IN by) (NP (NP (NN government)) (PP (IN of) (NP (NNP Punjab))))) (PP (IN on) (NP (DT the) (NN internet) (NNS services))))))))) (CC and) (VP (ADVP (RB just)) (VBP like) (NP (JJ YouTube) (NN ban)) (PRT (RP back)) (PP (IN in) (NP (CD 2012)))))) (, ,) (NP (NP (DT this) (NN internet) (NN tax)) (PP (IN in) (NP (NP (DT the) (JJS largest) (NN province)) (PP (IN of) (NP (NNP Pakistan)))))) (VP (VBZ does) (RB n't) (VP (VB seem) (S (VP (TO to) (VP (VB see) (NP (PRP$ its) (NN end)) (PP (ADVP (NP (DT any) (NN time)) (RB soon)) (IN despite) (NP (NP (JJ various) (JJ verbal) (NN commitment)) (PP (IN by) (NP (JJ Chief) (NNP Minister) (NNP Shahbaz) (NNP Sharif) (CC and) (JJ Provincial) (NN Finance) (NNP Minister) (NNP Ayesha) (NNPS Ghaus) (NNP Pasha))))) (PP (IN with) (NP (PDT all) (DT the) (NNS stakeholders)))))))) (. .))
(S (NP (JJ Inside) (NNS reports)) (VP (VBD confirmed) (SBAR (IN that) (S (NP (NP (DT a) (NN camp) (NN lead)) (PP (IN by) (NP (NP (NNP Chairman) (NNP Punjab) (NNP Revenue) (NNP Authority)) (PRN (-LRB- -LRB-) (NP (NNP PRA)) (-RRB- -RRB-))))) (VP (VBZ is) (NP (DT no) (NN mood) (S (VP (TO to) (VP (VB revoke) (NP (DT the) (NN tax)) (SBAR (IN that) (S (NP (PRP it)) (VP (VBD implemented) (PP (IN on) (NP (NNP May) (CD 28) (, ,) (CD 2014)))))))))))))) (. .))
(S (PP (IN On) (NP (DT the) (JJ other) (NN hand))) (, ,) (NP (NP (NNS people)) (PP (IN like) (NP (NP (NNP Chairman) (NNP PITB)) (, ,) (NP (NNP Umar) (NNP Saif)) (, ,)))) (VP (VBZ has) (VP (VBN been) (VP (VBG speaking) (PP (IN against) (NP (JJ such) (JJ anti-technology) (NN tax)))))) (. .))
(S (PP (IN In) (NP (PRP$ his) (JJ various) (NN statement))) (, ,) (NP (PRP he)) (VP (VBZ has) (VP (VBN shown) (NP (NN hope)) (SBAR (IN that) (S (NP (NNS things)) (VP (MD would) (VP (VB get) (NP (QP (JJR better) (IN but) (DT no)) (NNS signs)) (ADVP (RB yet)))))))) (. .))

我对匹配VP节点感兴趣,这些节点碰巧有多个其他VP节点。所以我想匹配最后一个VP节点。我的正则表达式:

\(VP\s*((?!VP)|[^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))\)

输出如下:

(VP (VP (VBZ has) (VP (VBN been) (PP (IN over) (NP (NN half))) (NP (DT a) (NN year)) (SBAR (IN since) (S (NP (DT a) (ADJP (CD 19.5) (NN %)) (NN tax)) (VP (VBD was) (VP (VBN imposed) (PP (IN by) (NP (NP (NN government)) (PP (IN of) (NP (NNP Punjab))))) (PP (IN on) (NP (DT the) (NN internet) (NNS services))))))))) (CC and) (VP (ADVP (RB just)) (VBP like) (NP (JJ YouTube) (NN ban)) (PRT (RP back)) (PP (IN in) (NP (CD 2012)))))
(VP (VBZ does) (RB n't) (VP (VB seem) (S (VP (TO to) (VP (VB see) (NP (PRP$ its) (NN end)) (PP (ADVP (NP (DT any) (NN time)) (RB soon)) (IN despite) (NP (NP (JJ various) (JJ verbal) (NN commitment)) (PP (IN by) (NP (JJ Chief) (NNP Minister) (NNP Shahbaz) (NNP Sharif) (CC and) (JJ Provincial) (NN Finance) (NNP Minister) (NNP Ayesha) (NNPS Ghaus) (NNP Pasha))))) (PP (IN with) (NP (PDT all) (DT the) (NNS stakeholders))))))))
(VP (VBD confirmed) (SBAR (IN that) (S (NP (NP (DT a) (NN camp) (NN lead)) (PP (IN by) (NP (NP (NNP Chairman) (NNP Punjab) (NNP Revenue) (NNP Authority)) (PRN (-LRB- -LRB-) (NP (NNP PRA)) (-RRB- -RRB-))))) (VP (VBZ is) (NP (DT no) (NN mood) (S (VP (TO to) (VP (VB revoke) (NP (DT the) (NN tax)) (SBAR (IN that) (S (NP (PRP it)) (VP (VBD implemented) (PP (IN on) (NP (NNP May) (CD 28) (, ,) (CD 2014))))))))))))))
(VP (VBZ has) (VP (VBN been) (VP (VBG speaking) (PP (IN against) (NP (JJ such) (JJ anti-technology) (NN tax))))))
(VP (VBZ has) (VP (VBN shown) (NP (NN hope)) (SBAR (IN that) (S (NP (NNS things)) (VP (MD would) (VP (VB get) (NP (QP (JJR better) (IN but) (DT no)) (NNS signs)) (ADVP (RB yet))))))))

我想要匹配的内容,例如,在第1行中我想只匹配最内层的VP:

(VP (VBN been) (PP (IN over) (NP (NN half))) (NP (DT a) (NN year)) (SBAR (IN since) (S (NP (DT a) (ADJP (CD 19.5) (NN %)) (NN tax)) (VP (VBD was) (VP (VBN imposed) (PP (IN by) (NP (NP (NN government)) (PP (IN of) (NP (NNP Punjab))))) (PP (IN on) (NP (DT the) (NN internet) (NNS services))))))))) (CC and) (VP (ADVP (RB just)) (VBP like) (NP (JJ YouTube) (NN ban)) (PRT (RP back)) (PP (IN in) (NP (CD 2012)))

因此忽略两个级别(VP (VP (VBZ has)。 任何想法如何指定具有嵌套括号匹配的否定匹配组?

1 个答案:

答案 0 :(得分:0)

您只需找到VP的最后一个实例作为起点。你的正则表达式的其余部分可以忽略它是否包含VP

\(VP\s+(?!.*\(VP\s)([^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))\)


\(VP\s+          # Look for "(VP "
(?!.*\(VP\s)     # Must not be followed by "(VP "
([^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))    # Arbitrary nested content
\)               # Closing parenthesis for "(VP ..."