为什么我的正则表达式单独工作但在我将它们组合时失败?

时间:2015-05-03 19:29:14

标签: regex

我试图写一个包含三个匹配组的正则表达式。我想要匹配的字符串/文本如下:

<td class="no-wrap past-rating" style="background-color: rgb(228, 254, 199);">
                    <div>
                        <b class="place">2</b><sup> 1</sup><sup class="remaining"> 1/2</sup>
                    </div>
                    <div>
                        46.96
                    </div>
                </td>

我试图匹配:2,1和1/2。

我已经编写了以下正则表达式,这些正则表达式在独立的基础上匹配所需的文本,但当我结合任意两个或全部三个时,我就会知道匹配。

/(?<one>(?<=<b class="place">).*(?=<\/b>))/ matches=> 2 

/(?<two>(?<=<\/b><sup>).*?(?=<\/sup><sup class=))/ matches=> 1

 /(?<three>(?<=="remaining">).*(?=<\/sup>))/ matches => 1/2

不幸的是,

/(?<one>(?<=<b class="place">).*(?=<\/b>))(?<two>(?<=<\/b><sup>).*?(?=<\/sup><sup class=))(?<three>(?<=="remaining">).*(?=<\/sup>))/ 

无法匹配任何内容。任何人都可以告诉我我哪里出错了,为什么合并的正则表达式失败并且单个表达式成功匹配。

3 个答案:

答案 0 :(得分:1)

也许你应该尝试这样的事情:

/<b class="place">(.*)<\/b><sup>\s*(.*)<\/sup><sup class="remaining">\s*(.*)<\/sup>/

Demo online

答案 1 :(得分:1)

我猜你可以制作一个更简单的正则表达式,即:

/>\s*?([\d\/]+)\s*?<\//

输出:

MATCH 1
`2`
MATCH 2
`1`
MATCH 3
`1/2`

演示:

https://regex101.com/r/dC7zR5/1

说明:

/>\s*?([\d\/]+)\s*?<\//gm

    > matches the characters > literally
    \s*? match any white space character [\r\n\t\f ]
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
    1st Capturing group ([\d\/]+)
        [\d\/]+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \d match a digit [0-9]
            \/ matches the character / literally
    \s*? match any white space character [\r\n\t\f ]
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
    < matches the characters < literally
    \/ matches the character / literally
    g modifier: global. All matches (don't return on first match)
    m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

答案 2 :(得分:1)

要“合并”正则表达式,您需要使用交替运算符|

(?<one>(?<=<b class="place">).*(?=<\/b>))|(?<two>(?<=<\/b><sup>).*?(?=<\/sup><sup class=))|(?<three>(?<=="remaining">).*(?=<\/sup>))

请参阅demo

但是,由于它是你试图匹配的HTML部分,我使用的是能够处理模式标记中多个属性的正则表达式,并且输入文本中有多行,如下所示:

<b\b[^<]*class="place"[^<]*>(?<one>[^<]*)|<\/b><sup[^<]*>(?<two>[^<]*)|="remaining"[^<]*>(?<three>[^<]*(?=<\/sup>))

请参阅another demo