Question

我正在尝试从this feed中提取内容。这是我正在使用的代码：

$rss  = new DOMDocument();
$rss->load($feed_url);

foreach ($rss->getElementsByTagName('entry') as $node) {
   $description = $node->getElementsByTagName('content')->item(0)->nodeValue;
   echo $description;
}

然而，这不是回显HTML回显纯文本。这是饲料的结构。

<entry>
<title>.....</title>
<link rel=".." type="..." href="...." />
...... More tags ......
<content type="xhtml" xml:lang="en-US"  xml:base="http://www.abeautifulmess.com/">
  <div xmlns="http://www.w3.org/1999/xhtml"> HTML is all here.
  </div>
</content>

任何其他Feed都没有发生过这种情况。是因为内容类型还是别的什么？

Answer 1

使用DOMDocument::saveHTML将保留节点的html格式。这会给你你想要的东西：

$feed_url = 'http://feeds.feedburner.com/a_beautiful_mess?format=xml';
$rss  = new DOMDocument();
$rss->load($feed_url);

foreach ($rss->getElementsByTagName('entry') as $node) {
   $description = $node->getElementsByTagName('content')->item(0);
   echo $rss->saveHTML($description);
}

RSS feed在PHP中返回纯文本而不是HTML？

1 个答案: