从rss Feed中获取子标记中的内容

时间:2014-01-26 11:04:36

标签: php html xml xpath rss

我尝试使用上面的脚本从this rss feed获取<div class="start-teaser">的内容,尝试使用xpath,如下所示:

$xpath = new DOMXPath($html); $desc = $xpath->query("//*[@class='start-teaser']");

但它没有接受它。我不明白为什么。 我也试过像这样做:

$desc = $html->getElementsByTagName('p')->item(0)->getAttribute('class');

但这只返回类名。我需要该div的内容(文本)而不是类名。

public function NewsRss() {
$rss = new DOMDocument();
$rss->load('http://www.autoexpress.co.uk/feeds/all');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
  $htmlStr = $node->getElementsByTagName('description')->item(0)->nodeValue;
  $html = new DOMDocument();        
  $html->loadHTML($htmlStr);
  $xpath = new DOMXPath($html);
  $desc = $xpath->query("//*[@class='start-teaser']");
  $imgTag = $html->getElementsByTagName('img');
  $img = ($imgTag->length==0)?'noimg.png':$imgTag->item(0)->getAttribute('src');
  $item = array (
    'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
    //'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
    'desc' => $desc,
    'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
    'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
'image' => $img,
  );
  array_push($feed, $item);
}
$limit = 3;
for($x=0;$x<$limit;$x++) {
  $title = str_replace(' & ', ' &amp; ', $feed[$x]['title']);
  $link = $feed[$x]['link'];
  $description = $feed[$x]['desc'];
  $date = date('l F d, Y', strtotime($feed[$x]['date']));
  echo '<div class="news-row-index">';
  echo '<div class="img"><a href="'.$link.'" target="_blank" title="'.$title.'"><img src="'.$feed[$x]['image'].'" height="79" width="89"></a></div>';
  echo '<div class="details-index"><p><h5><a href="'.$link.'" target="_blank" title="'.$title.'">'.$title.'</a></h5><br />';
  echo '<small><em>Posted on '.$date.'</em></small></p>';
  echo '<p>'.$feed[$x]['desc'].'</p></div>';
  echo '</div>';
}
echo '<a style="margin-left:10px;" class="view-all-but" target="_blank" href="http://www.autoexpress.co.uk/feeds/all">View all</a>';
}

1 个答案:

答案 0 :(得分:1)

类值为short-teaser,而不是start-teaser;所以请改用//*[@class='short-teaser']

要匹配HTML类,还要考虑此问题:How can I match on an attribute that contains a certain string?