PHP:通过子内容提取两个标签之间的字符串

时间:2014-10-19 10:17:37

标签: php html regex html-parsing domdocument

我有以下html标记:

<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
        </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>

我希望从上次3,840 li抓住"Downloads:"

你有什么建议?

我的尝试:

preg_match('/<li><strong>Downloads:<\/strong>(.*?)<\/li>/s', $s, $a);

2 个答案:

答案 0 :(得分:3)

我建议在这里使用HTML解析器,DOMDocument特别是使用xpath。

示例:

$markup = '<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
    </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>';

$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
// this just simply means get the string next on that strong tag with a text of Downloads:
$download = trim($xpath->evaluate("string(//strong[text()='Downloads:']/following-sibling::text())"));
echo $download; // 3,840

答案 1 :(得分:1)

使用html解析器解析html文件。如果你坚持正则表达式,那么你可以尝试下面的,

<li>[^<>]*<strong>Downloads:<\/strong>\s*\K.*?(?=\s*<\/li>)

DEMO

代码:

$string = <<<EOT
<ul>
    <li>
        <strong>Online:</strong>
        2/14/2010 3:40 AM
    </li>
    <li>
        <strong>Hearing Impaired:</strong>
        No
    </li>
    <li>
        <strong>Downloads:</strong>
        3,840
    </li>
</ul>
EOT;
$regex = "~<li>[^<>]*<strong>Downloads:<\/strong>\s*\K.*?(?=\s*<\/li>)~s";
if (preg_match($regex, $string, $m)) {
    $yourmatch = $m[0]; 
    echo $yourmatch;
    } // 3,840