Question

我有一个来自女巫的str我要解析所有<li></li>标签，这是字符串。

<li>Want this</li>DON'T WANT THIS<li>Want this</li>DON'T WANT THIS<li>Want this</li>...

这是我正在使用的代码：

$my_text= array();
preg_match('/<li>(.*?)<\/li>/', $str, $my_text);

但它不起作用。当我运行它时，这是my_text数组：

[0] => "<li>Want this</li>"
[1] => "Want this"

它只有2个元素，其中1000个。

Answer 1

Toto是正确的，这是一个非常简单的修复：

$str = "<li>Want this</li>DON'T WANT THIS<li>Want this</li>DON'T WANTTHIS<li>Want this</li>";

$my_text= array();
preg_match_all('/<li>(.*?)<\/li>/', $str, $my_text);

Answer 2

我可以根据SimpleXML和xpath queries提出另一种解决方案吗？

<?php
$string = "<html>
            <li>Want this</li>DON'T WANT THIS<li>Want this</li>DON'T WANT THIS<li>Want this</li>
        </html>";

$xml = simplexml_load_string($string);
# select only the li elements where the text is equal to...
$elements = $xml->xpath("//li[text() = 'Want this']");
print_r($elements);
// yields a list of your desired elements
?>

提示：您的正则表达式也可以使用，请参阅a demo on regex101.com。考虑使用其他分隔符：

$regex = '~<li>(.+?)</li>~';
preg_match_all($regex, $string, $matches);
print_r($matches);

Answer 3

您只需使用preg_match_all()函数，就像这样：

<?php

$str = "<li>Want this</li>DON'T WANT THIS<li>Want this</li>DON'T WANT THIS<li>Want this</li>";
preg_match_all('/<li>(.*?)<\/li>/', $str, $out);
echo '<pre>';
print_r($out);

Online Demo

Answer 4

使用preg_match_all，如上所述。它确实是最好的解决方案。

preg_match_all("|<[^>]+>(.*)</[^>]+>|U", $input, $result, PREG_SET_ORDER);

以上示例将从输入中删除任何html标记，而不仅仅是li。

PHP preg匹配不起作用

4 个答案:

Online Demo