给出Html -
<div id="testid">
<h1>Test Title</h1>
<ul class="clearfix">
<li class="anker" id="artists-A"></li>
<li class="first">
<a href="www.test1.html" title="Test1">
<span>
<img src="https://www.test1.de/img/test1.jpg" alt="Test1" />
<span>Test1</span>
</span>
</a>
</li>
<li>
<a href="www.test2.html" title="Test2">
<span>
<img src="https://www.test2.de/img/test2.jpg" alt="Test2" />
<span>Test2</span>
</span>
</a>
</li>
<li class="first">
<a href="www.test3.html" title="Test3">
<span>
<img src="https://www.test1.de/img/test3.jpg" alt="Test3" />
<span>Test3</span>
</span>
</a>
</li>
</ul>
</div>
需要获得href值,img src和span即标题。 我正在使用domDocument解析它,但没有得到确切的结果。
代码:
$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents($url));
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//[@id="testid"]/ul/li');
答案 0 :(得分:1)
答案 1 :(得分:0)
我建议您使用SimpleHtmlDom库。
<?php
require_once "SimpleHtmlDom.php";
// put in file contentToParse.html your html code
$htmlToParse = file_get_contents("contentToParse.html");
$htmlObject = str_get_html($htmlToParse);
$resultObject = array();
foreach($htmlObject->find("#testid ul li a") as $singleLink)
{
var_dump($singleLink->href);
}
foreach($htmlObject->find("#testid ul li img") as $singleImage)
{
var_dump($singleImage->src);
}
foreach($htmlObject->find("#testid ul li span span") as $singleSpan)
{
var_dump($singleSpan->innertext);
}
?>