Question

我想阅读所有带有单词标题的标签属性，下面的HTML示例

<html>
    <head>
        <title> </title>
    </head>
    <body>
        <div title="abc"> </div>
        <div> 
            <span title="abcd"> </span>
        </div>
        <input type="text" title="abcde">
    </body>
</html>

我尝试了此regex函数，该函数不起作用

preg_match('\btitle="\S*?"\b', $html, $matches);

Answer 1

仅对我的评论进行跟进，使用正则表达式并不是特别安全或足够健壮来管理HTML（尽管对于某些HTML-几乎没有希望能完全正常工作）-请阅读https://stackoverflow.com/a/1732454/1213708。

使用DOMDocument提供了一种更可靠的方法，在您可以使用XPath并使用title搜索任何//@title属性（@是XPath表示法）后进行处理属性）。

$html = '<html>
<head>
   <title> </title>
</head>
 <body>
   <div title="abc"> </div>
   <div> 
           <span title="abcd"> </span>
   </div>
       <input type="text" title="abcde">
</body>
</html>';

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);

foreach($xpath->query('//@title') as $link) {
    echo $link->textContent.PHP_EOL;
}

输出...

abc
abcd
abcde

Answer 2

这是一个正则表达式解决方案

preg_match_all('~\s+title\s*=\s*["\'](?P<title>[^"]*?)["\']~', $html, $matches);
$matches = array_pop($matches);
foreach($matches as $m){
    echo $m . " ";
}

使用正则表达式从HTML提取标签属性

2 个答案: