Question

可能重复：
  crawling a html page using php?
  Best methods to parse HTML

我的php脚本中有一个字符串变量，其中包含html-page。我如何从这个字符串中提取DOM元素？

例如，在此字符串'<div class="someclass">text</div>'中，我希望获得变量'text'。我怎么能这样做？

Answer 1

您需要使用DOMDocument类，更具体地说，使用其loadHTML方法，将HTML字符串加载到DOM对象。

例如：

$string = <<<HTML
<p>test</p>
<div class="someclass">text</div>
<p>another</p>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($string);

之后，您将能够操作DOM，例如使用DOMXPath类对其执行XPath查询。

例如，在您的情况下，您可以使用基于此部分代码的内容：

$xpath = new DOMXpath($dom);
$result = $xpath->query('//div[@class="someclass"]');
if ($result->length > 0) {
    var_dump($result->item(0)->nodeValue);
}

在这里，它会为您提供以下输出：

string 'text' (length=4)

作为替代方案，您也可以使用simplexml_load_string和SimpleXMLElement::xpath代替DOMDocument - 但对于复杂的操作，我通常更喜欢使用DOMDocument。

Answer 2

查看DOMDocument和DOMXPath。

$DOM = new DOMDocument();
$DOM->loadHTML($str);

$xpath = new DOMXPath($DOM);
$someclass_elements = $xpath->query('//[@class = "someclass"]');
// ...

在PHP中从字符串中提取DOM元素

2 个答案: