我有一个看起来像这样的页面:
...
<div class="container">
<div class="info">
<h3>Info 1</h3>
<span class="title">Title for Info 1</span>
<a href="http://www.example.com/1">Link to Example 1</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 2</h3>
<span class="title">Title for Info 2</span>
<a href="http://www.example.com/2">Link to Example 2</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 3</h3>
<span class="title">Title for Info 3</span>
<a href="http://www.example.com/3">Link to Example 3</a>
</div> <!-- /info -->
</div> <!-- /container -->
...
每个info类div的结构是相同的,我希望能够循环遍历文档,并为每个div提供一个信息类,将各个组件解析为数组或单个变量用于以某种人类可读格式输出数据,如csv文件或HTML表格。
我尝试过使用DOMDocument方法,并使用getElementByTagName提取每个标记的内容,但由于div包含多个标记类型(h3,a,span),我还没弄清楚如何完成我的工作我希望这样做。
最后,我希望能够以这样的格式输入数据:
divclass, h3, spanclass, spantitle, ahref, a
info, Info 1, title, Title for Info 1, http://www.example.com/1, Link to Example 1
...
谢谢!
答案 0 :(得分:4)
<?php
$html = '
<div class="container">
<div class="info">
<h3>Info 1</h3>
<span class="title">Title for Info 1</span>
<a href="http://www.example.com/1">Link to Example 1</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 2</h3>
<span class="title">Title for Info 2</span>
<a href="http://www.example.com/2">Link to Example 2</a>
</div> <!-- /info -->
<div class="info">
<h3>Info 3</h3>
<span class="title">Title for Info 3</span>
<a href="http://www.example.com/3">Link to Example 3</a>
</div> <!-- /info -->
</div> <!-- /container -->
';
$dom_document = new DOMDocument();
$dom_document->loadHTML($html);
$dom_document->preserveWhiteSpace = false;
//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);
$elements = $dom_xpath->query("//*[@class='info']");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "\n[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}