使用DOMDocument按类从HTML文档中提取

时间:2011-02-25 19:40:34

标签: php domdocument

在DOMDocument类中,有一些方法可以通过id和标记名称(getElementById& getElementsByTagName)获取元素,但不能按类获取。有没有办法做到这一点?

举个例子,我如何从以下标记中选择div?

<html>
...
<body>
...
<div class="foo">
...
</div>
...
</body>
</html>

3 个答案:

答案 0 :(得分:12)

简单的答案是使用xpath:

$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[@class="foo"]')->item(0);

但那不会接受空格。因此,要按空格分隔类进行选择,请使用以下查询:

//*[contains(concat(' ', normalize-space(@class), ' '), ' class ')

答案 1 :(得分:2)

$html = '<html><body><div class="foo">Test</div><div class="foo">ABC</div><div class="foo">Exit</div><div class="bar"></div></body></html>';

$dom = new DOMDocument();
@$dom->loadHtml($html);

$xpath = new DOMXPath($dom);

$allClass = $xpath->query("//@class");
$allClassBar = $xpath->query("//*[@class='bar']");

echo "There are " . $allClass->length . " with a class attribute<br>";

echo "There are " . $allClassBar->length . " with a class attribute of 'bar'<br>";

答案 2 :(得分:0)

除了ircmaxell的答案,如果你需要按空格分隔的类选择:

$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$classname='foo';
$div = $xpath->query("//table[contains(@class, '$classname')]")->item(0);