通过DOMDocument PHP获取DIV元素内容

时间:2014-10-30 13:46:31

标签: php html domdocument

我必须从网站的div中恢复一些新闻。 div的结构如下:

HTML标记:

<ul id="news-accordion" class="rounded" style="padding: 2px;">
   <li class="o">
         <h3>
            <span>TITLE ARTICLE</span>
            <span>30/10/2014</span>
         </h3>
         <div style="display: none;">
              <p>text of article</p>
         </div>
   </li>
   <li class="e">
         <h3>
            <span>TITLE ARTICLE</span>
            <span>28/10/2014</span>
         </h3>
         <div style="display: none;">
              <p>text of article</p>
         </div>
   </li>
   <li class="o">
         <h3>
            <span>TITLE ARTICLE</span>
            <span>29/10/2014</span>
         </h3>
         <div style="display: none;">
              <p>text of article</p>
         </div>
   </li>                                                     
</ul>

PHP

<?php 

$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('http://www.xxxxxxxxx/news.php'));

$news = $doc->getElementById('news-accordion');

$li = $news->getElementsByTagName('li'); 

foreach ($li as $row){ 

    $title = $row->getElementsByTagName('h3'); 
    echo $title->item(0)->nodeValue."<br><br>"; 

    /*foreach ($title as $row2){ 
    echo $row2->nodeValue."<br><br>";
    //echo $row2->item(0)->nodeValue."<br><br>"; 
    }*/

    $text = $row->getElementsByTagName('p'); 
    echo utf8_decode($text->item(0)->nodeValue)."<br><br><br>"; 

}

?>

代码可以正常工作,但是当我打印span标记echo $title->item(0)->nodeValue;的内容时,

两个跨度的文本一起打印。

如何分别获取两个跨度的内容?感谢。

2 个答案:

答案 0 :(得分:2)

是的,你可以,只需调整->item()索引。就像你在其他元素中已经完成的那样,将它指向那个头元素,然后只是明确地指向那些跨度子元素:

foreach ($li as $row){ 

    $h3 = $row->getElementsByTagName('h3')->item(0);
    $title = $h3->getElementsByTagName('span')->item(0); // first span
    $date = $h3->getElementsByTagName('span')->item(1); // second span

    echo $title->nodeValue . '<br/>';
    echo $date->nodeValue . '<br/>';

    $text = $row->getElementsByTagName('p'); 
    echo utf8_decode($text->item(0)->nodeValue)."<br><br><br>"; 

}

答案 1 :(得分:-1)

$title = $row->getElementsByTagName('h3'); 
echo $title->item(0)->nodeValue."<br><br>"; 

将以上两行替换为下方(而不是使用h3代码使用span代码)

$title = $row->getElementsByTagName('span'); 
echo $title->item(0)->nodeValue."<br><br>"; 
echo $title->item(1)->nodeValue."<br><br>"; 

它为我工作。