如何在PHP中提取值格式的HTML部分

时间:2013-10-07 13:30:59

标签: php html

我需要从HTML页面中提取数据,如下所示:

<li>
    <h2>
        <a href="/rss/Football/actu_rss_35.xml" target="_blank" class="rss"><span>rss</span></a>AC Ajaccio</h2>
    <div class="club-left">
        <a href="/Football/FootballFicheClub35.html" title="AC Ajaccio"><img src="http://medias.lequipe.fr/logo-football/35/60?CCH-13-40" width="60" height="60"></a>
    </div>
    <div class="club-right">
        <ul class="club-links">
            <li><span class="plus"></span>
                <a href="/Football/FootballFicheClub35.html">Fiche club </a>
            </li>
            <li><span class="plus"></span>
                <a href="/Football/FootballFicheClub35.html#Calendrier">Calendrier</a>
            </li>
            <li><span class="plus"></span><a href="/Football/FootballFicheClub35.html#Effectif">Effectif</a>
            </li>
            <li><span class="plus"></span>
                <a href="/Football/FootballFicheClub35.html#Joueurs">Stats joueurs</a>
            </li>
            <li><span class="plus"></span>
                <a href="/Football/FootballFicheClub35.html#Statistiques">Stats club</a>
            </li>
        </ul>
    </div>
    <div class="clubt hidden">35</div>
    <div class="clear"></div>
</li>

我想在PHP中提取href值和这部分的文本:

<a href="**/Football/FootballFicheClub35.html#Joueurs**">**Stats joueurs**</a> 

我使用以下代码,但缺少一些东西:

$elements = $xpath->query("//div[@id='Base']/ul/li");
if (!is_null($elements)) {
  foreach ($elements as $element) {
    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
        if($node->nodeName!='#text'){
            echo $node->nodeValue.";<br/>";
            $stringData = trim($node->nodeValue).";";
        }
    }
}

1 个答案:

答案 0 :(得分:1)

更新:

尝试:

$elements = $xpath->query("//ul[@class='club-links']//a");
foreach ($elements as $element) {
  echo $element->nodeValue." - ".$element->getAttribute("href")."<br/>";
}