PHP从xml获取img src

时间:2016-04-16 15:42:53

标签: php xml image tags domdocument

我有一个xml页面,如下所示:

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
  <channel>
    <title>FB-RSS feed for Salman Khan  Fc</title>
    <link>http://facebook.com/profile.php?id=1636293749919827/</link>
    <description>FB-RSS feed for Salman Khan  Fc</description>
    <managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
    <pubDate>31 Mar 16 20:00 +0000</pubDate>
    <item>
      <title>Photo - Who is the Best Khan ?</title>
      <link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
      <description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&amp;oe=57BB41D5&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;Who is the Best Khan ?</description>
      <author>FB-RSS</author>
      <guid>1636293749919827_1713146978901170</guid>
      <pubDate>31 Mar 16 20:00 +0000</pubDate>
    </item>
    <item>
      <title>Photo</title>
      <link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
      <description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&amp;oe=57778068&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;</description>
      <author>FB-RSS</author>
      <guid>1636293749919827_1713146755567859</guid>
      <pubDate>31 Mar 16 19:58 +0000</pubDate>
    </item>
  </channel>
</rss>

我希望获得上面srcimg的{​​{1}}个。

图像存储在xml中,但它们的格式不是

<description>

他们看起来很像:

<img...

&lt;img src=&#34;https://scontent.xx.fbc...替换为< ...我猜这就是为什么&lt;不返回任何内容。

有什么工作吗?

这就是我所说的:

$imgs = $dom->getElementsByTagName('img');

任何想法?

1 个答案:

答案 0 :(得分:1)

您在XML标记中嵌入了HTML,因此您必须检索XML节点,加载每个HTML并检索所需的标记属性。

在您的XML中有不同的<description>个节点,因此使用->getElementsByTagName将返回比您想要的节点更多的节点。使用DOMXPath仅检索右树位置的<description>个节点:

$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;

$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );

然后迭代所有节点,在新的DOMDocument中加载节点值(不需要解码html实体,DOM已经为你解码),并从src节点中提取<img>属性:

foreach( $nodes as $node )
{
    $html = new DOMDocument();
    $html->loadHTML( $node->nodeValue );
    $src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}

eval.in demo