Question

我有以下html标记：

<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
        </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>

我希望从上次3,840 li抓住"Downloads:"。

你有什么建议？

我的尝试：

preg_match('/<li><strong>Downloads:<\/strong>(.*?)<\/li>/s', $s, $a);

Answer 1

我建议在这里使用HTML解析器，DOMDocument特别是使用xpath。

示例：

$markup = '<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
    </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>';

$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
// this just simply means get the string next on that strong tag with a text of Downloads:
$download = trim($xpath->evaluate("string(//strong[text()='Downloads:']/following-sibling::text())"));
echo $download; // 3,840

Answer 2

使用html解析器解析html文件。如果你坚持正则表达式，那么你可以尝试下面的，

<li>[^<>]*<strong>Downloads:<\/strong>\s*\K.*?(?=\s*<\/li>)

DEMO

代码：

$string = <<<EOT
<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
    </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>
EOT;
$regex = "~<li>[^<>]*<strong>Downloads:<\/strong>\s*\K.*?(?=\s*<\/li>)~s";
if (preg_match($regex, $string, $m)) {
    $yourmatch = $m[0]; 
    echo $yourmatch;
    } // 3,840

PHP：通过子内容提取两个标签之间的字符串

2 个答案: