我需要从HTML页面中提取数据,如下所示:
<li>
<h2>
<a href="/rss/Football/actu_rss_35.xml" target="_blank" class="rss"><span>rss</span></a>AC Ajaccio</h2>
<div class="club-left">
<a href="/Football/FootballFicheClub35.html" title="AC Ajaccio"><img src="http://medias.lequipe.fr/logo-football/35/60?CCH-13-40" width="60" height="60"></a>
</div>
<div class="club-right">
<ul class="club-links">
<li><span class="plus"></span>
<a href="/Football/FootballFicheClub35.html">Fiche club </a>
</li>
<li><span class="plus"></span>
<a href="/Football/FootballFicheClub35.html#Calendrier">Calendrier</a>
</li>
<li><span class="plus"></span><a href="/Football/FootballFicheClub35.html#Effectif">Effectif</a>
</li>
<li><span class="plus"></span>
<a href="/Football/FootballFicheClub35.html#Joueurs">Stats joueurs</a>
</li>
<li><span class="plus"></span>
<a href="/Football/FootballFicheClub35.html#Statistiques">Stats club</a>
</li>
</ul>
</div>
<div class="clubt hidden">35</div>
<div class="clear"></div>
</li>
我想在PHP中提取href值和这部分的文本:
<a href="**/Football/FootballFicheClub35.html#Joueurs**">**Stats joueurs**</a>
我使用以下代码,但缺少一些东西:
$elements = $xpath->query("//div[@id='Base']/ul/li");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
if($node->nodeName!='#text'){
echo $node->nodeValue.";<br/>";
$stringData = trim($node->nodeValue).";";
}
}
}
答案 0 :(得分:1)
更新:
尝试:
$elements = $xpath->query("//ul[@class='club-links']//a");
foreach ($elements as $element) {
echo $element->nodeValue." - ".$element->getAttribute("href")."<br/>";
}