Question

我正在寻找一种简单的方法来解析RSS描述标签，以便存储img链接以及显示的任何文本。

我正在使用Last RSS来解析RSS提要，但是，描述标记对于每个提要都有很多不同的信息，它们都有很大不同。

目前我正在使用Simple HTML Dom Parser从描述中获取img链接，但是，这似乎并不适用于所有Feed。例如，它不能从dribbble中识别以下描述中的img：

<![CDATA[<a height="150" href="http://dribbble.com/shots/1405514-sparrows" width="200"><img alt="sparrows" height="600" src="http://d13yacurqjgara.cloudfront.net/users/14152/screenshots/1405514/sparrows.jpg" width="800" /></a><p>We are currently working on identity for a new Canadian project called "sparrows". A place for creatives to gather some useful readings.</p>]]>

获取此信息的最佳方式是什么？

Answer 1

最好的方法可能就是使用xpath，但是假设你已经开始使用简单的html dom：

require('simple_html_dom.php');

$rss =<<<EOF
<description>
<![CDATA[<a height="150" href="http://dribbble.com/shots/1405514-sparrows" width="200"><img alt="sparrows" height="600" src="http://d13yacurqjgara.cloudfront.net/users/14152/screenshots/1405514/sparrows.jpg" width="800" /></a><p>We are currently working on identity for a new Canadian project called "sparrows". A place for creatives to gather some useful readings.</p>]]>
</description>
EOF;

$rss_dom = str_get_html($rss);

# find the cdata string from the description and strip out the CDATA part
$cdata = $rss_dom->find('description', 0)->text();
$str = preg_replace('/<!\[CDATA\[(.*)\]\]>/', '\1', $cdata);

# make a new dom with just the cdata html and query that
$html_dom = str_get_html($str);
echo $html_dom->find('img', 0)->src;

# http://d13yacurqjgara.cloudfront.net/users/14152/screenshots/1405514/sparrows.jpg

Answer 2

我使用LastRSS内置函数去除CDATA标记并保持其余的html属性。

这完成如下：

// Create lastRSS object
$rss = new lastRSS;

$rss->CDATA = 'content';

如何用PHP解析RSS描述标记

2 个答案: