DOM loadhtml提取节点和子节点

时间:2015-03-17 16:55:01

标签: php dom xpath foreach nodes

我有一个项目列表,我需要获取列表标题属性,链接URL和显示的链接文本,以及每个列表标记的跨度值。

<ul>
<li class="testclass" title="Title 1 goes here">
<a href="http://examplelink1.com">List Text 1</a>
<span>Second List Text 1</span>
</li>
<li class="testclass" title="Title 2 goes here">
<a href="http://examplelink2.com">List Text 2</a>
<span>Second List Text 2</span>
</li>
</ul>

如何使用foreach提取每个单独的列表标记及其值(因为我之后需要将值插入MySQL数据库中(不同数据库字段中的每个值)。

到目前为止,我只能将它们分开来了:

<?php
$doc = new DOMDocument();
@$doc->loadHTML($list);
$imageTags = $doc->getElementsByTagName('a'); 
foreach($imageTags as $tag) {
$link = $tag->getAttribute('href');
echo $link.'<br/>';
}
?>

<?php
$doc = new DOMDocument();
@$doc->loadHTML($list);
$imageTags = $doc->getElementsByTagName('li'); 
foreach($imageTags as $tag) {
$link = $tag->getAttribute('title');
echo $link.'<br/>';
}
?>

我找到了一个带有xpath的脚本,但我不知道如何正确应用它来获取我需要的特定值并在MySQL语句中使用它们:

<?php
$dom = new DOMdocument();
@$dom->loadHTML($list);
$xpath = new DOMXPath($dom);
$elements = $xpath->query("//*");
foreach ($elements as $element) {
echo "<p>". $element->nodeName. "</p>";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "<br/>";
}
}
?>

1 个答案:

答案 0 :(得分:2)

使用DOMXPath::evaluate()。它是ext/dom的一部分,允许您使用XPath表达式从DOM中获取节点和值。

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);

// use an xpath expression to fetch the li nodes
foreach ($xpath->evaluate('//ul/li[@class="testclass"]') as $li) {
  var_dump(
    [
      // this is a direct attribute of the li node, use dom method
      'title' => $li->getAttribute('title'),
      // more complex, use an xpath expression
      'href' => $xpath->evaluate('string(a/@href)', $li),
      // Cast the node to a string to return the text content
      'link-text' => $xpath->evaluate('string(a)', $li),
      // works for the span, too
      'description' => $xpath->evaluate('string(span)', $li)
    ]
  );
}

输出:

array(4) {
  ["title"]=>
  string(17) "Title 1 goes here"
  ["href"]=>
  string(23) "http://examplelink1.com"
  ["link-text"]=>
  string(11) "List Text 1"
  ["description"]=>
  string(18) "Second List Text 1"
}
array(4) {
  ["title"]=>
  string(17) "Title 2 goes here"
  ["href"]=>
  string(23) "http://examplelink2.com"
  ["link-text"]=>
  string(11) "List Text 2"
  ["description"]=>
  string(18) "Second List Text 2"
}