我有一个xml页面,如下所示:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&oe=57BB41D5"></a><br><br>Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&oe=57778068"></a><br><br></description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>
我希望获得上面src
中img
的{{1}}个。
图像存储在xml
中,但它们的格式不是
<description>
<img...
。
<img src="https://scontent.xx.fbc...
替换为<
...我猜这就是为什么<
不返回任何内容。
有什么工作吗?
这就是我所说的:
$imgs = $dom->getElementsByTagName('img');
任何想法?
答案 0 :(得分:1)
您在XML标记中嵌入了HTML,因此您必须检索XML节点,加载每个HTML并检索所需的标记属性。
在您的XML中有不同的<description>
个节点,因此使用->getElementsByTagName
将返回比您想要的节点更多的节点。使用DOMXPath
仅检索右树位置的<description>
个节点:
$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );
然后迭代所有节点,在新的DOMDocument
中加载节点值(不需要解码html实体,DOM已经为你解码),并从src
节点中提取<img>
属性:
foreach( $nodes as $node )
{
$html = new DOMDocument();
$html->loadHTML( $node->nodeValue );
$src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}
的 eval.in demo 强>