如何匹配字符串中的所有字符并收集多个组?

时间:2019-07-18 19:34:08

标签: php regex

我试图弄清楚如何使用preg_match_all选择特定比赛中的所有“ -on_”。

我已经尝试了很多正则表达式模式,但是我完全陷入了困境。我们公司最好的正则表达式已经花了一个小时或两个小时来完成,而且也没有取得任何进展。

这似乎是最有前途的.*(-on_).*-但只能抓住每场比赛的最后“ -on_”。同样,第一个匹配项可以正常运行,但是第二个匹配项是页面上的所有内容。我不明白为什么。

我要解析的HTML示例...

<span class="RatingStar__bew-avgstars__2enAh">
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__28Wmg">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
            </div>
        </span>

... more unimportant no-need-to-match code between ...


<span class="RatingStar__bew-avgstars__2enAh">
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__28Wmg">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-on__2ks1e">★</span></span>
            </div>
            <div class="RatingStar__be-c-star__24d1B ">
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
                <span><span class="RatingStar__be-star-off__2ks1e">★</span></span>
            </div>
        </span>

我用来解析它的东西...

preg_match_all('~<span class="RatingStar__bew-avgstars__2enAh">.*(-on_).*</div></span>~', $html, $matches)

我得到的响应与其太大无关,所以我只作总结:

array:2 [▼
  0 => array:2 [▼
    0 => "Perfectly correct match"
    1 => "Match of the rest of the page (not correct)"
  ]
  1 => array:2 [▼
    0 => "-on_" // Last on in the match
    1 => "-on_" // Last on in the second match
  ]
]

对于我应该获得的2场比赛,我应该使用列出的代码在每场比赛中得到4组“ -on _”。

所以我实际上期望的是:

array:2 [▼
  0 => array:2 [▼
    0 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★</span></span><span ▶"
    1 => "<span class="RatingStar__bew-avgstars__2enAh"><div class="RatingStar__be-c-star__24d1B "><span><span class="RatingStar__be-star-off__2ks1e">★</span></span><span ▶"
  ]
  1 => array:2 [▼
    0 => ["-on_","-on_","-on_","-on_"] 
    1 => ["-on_","-on_","-on_","-on_"]
  ]
]

Maybe I'm completely missing something here... any advice?

1 个答案:

答案 0 :(得分:2)

我相信这更接近您想要的:

~<span class="RatingStar__bew-avgstars__2enAh">[\s\S]*?(-on_)[\s\S]*?</div>\s*</span>~

您遇到三个问题:

  1. .*与换行符\n不匹配。 More info。您可以改用[\s\S]*,它匹配每个空白字符和每个非空白字符(因此,每个字符)。
  2. 文本</div></span>不会出现在您的代码段中。 </div></span>之间有空格。因此,</div>\s*?</span>
  3. 您使用的是贪婪运算符*,而不是惰性运算符*?。这是一个问题,因为您的整个字符串以</div></span>结尾,这意味着第一个匹配项将消耗所有其他匹配项并继续到字符串的末尾。