Question

我有一个我想要匹配的简单模式，在HTML段落末尾的括号之间捕获的任何字符。每当该段中有其他括号内容时，我就会遇到麻烦：

即

如果输入字符串是“.....（321）＆lt; / p＆gt;”我想得到值（321）

但是，如果该段落有此文字：“......（123）（321）＆lt; / p＆gt;”我的正则表达式正在回归 “（123）（321）”（开头之间的所有内容“（”和结束“）”

我正在使用正则表达式模式“\ s（。+）＆lt; / p＆gt;”

如何获取正确的值（使用VB.NET）

这就是我到目前为止所做的事情：

    Dim reg As New Regex("\s\(.+\)</P>", RegexOptions.IgnoreCase)
    Dim matchC As MatchCollection = reg.Matches(su.Question)
    If matchC.Count > 0 Then
        Dim lastMatch As Match = matchC(matchC.Count - 1)
        Dim DesiredValue As String = lastMatch.Value
    End If

Answer 1

只需将表达式更改为非贪婪并反转匹配顺序：

Dim reg As New Regex("\s\(.+?\)</P>", RegexOptions.IgnoreCase Or RegexOptions.RightToLeft)

或者只匹配一个右括号：

"\s\([^)]+\)</P>"

或者只匹配pharentesis中的数字：

"\s\(\d+\)</P>"

编辑：为了使非贪婪的样本生效，您需要在Regex对象上设置RightToLeft标志

Answer 2

Dim reg As New Regex("\s\(\d+\)</P>", RegexOptions.IgnoreCase)

你的绊脚石是.（它匹配所有字符，包括括号）和+的贪婪（它尽可能匹配）的特异性不足。

更具体（\d+）或更少贪婪（.+?）。

Answer 3

您需要使用前瞻（？=）来锚定模式。这提示了数据应停止的解析器的提示，并将其锚定到。这是一个从p标记锚点获取previous（）数据的示例：

(?:\()([^)]+)(?:\))(?=</[pP]>)


(?:\()        - Match but don't capture a (
([^)]+)       - Get all the data until a ) is hit. [^ ] is the not set
(?:\))        - Match but don't capture the )  
(?=</[pP]>)  - Look Ahead Match but don't capture a suffix of </p or P >

HTH

.Net Regular Expression在<p>标签</p>的末尾获取括号内的文本

3 个答案: