如何从xml标签获取特定字段?

时间:2014-09-09 08:43:03

标签: php html xml json

我正在研究的xml看起来像这样:

     <item>
     <title>$39.99 and Under Juniors' Swimwear</title>
     <link>http://www.amazon.com/s/ref=xs_gb_rss_A1RFRNENBWTVO4/?rh=n:1036592,n:!2334084011,n:!2334146011,n:8021415011,p_6:ATVPDKIKX0DER&amp;bbn=8021415011&amp;ie=UTF8&amp;qid=1398271335&amp;rnid=15683531&amp;ccmID=380205&amp;tag=bugash-20</link>
     <description>&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;a rel="nofollow" target="_blank" href="http://www.amazon.com/s/ref=xs_gb_rss_A1RFRNENBWTVO4/?rh=n:1036592,n:!2334084011,n:!2334146011,n:8021415011,p_6:ATVPDKIKX0DER&amp;bbn=8021415011&amp;ie=UTF8&amp;qid=1398271335&amp;rnid=15683531&amp;ccmID=380205&amp;tag=rssfeeds-20"&gt;&lt;img src="http://ecx.images-amazon.com/images/I/31kwUz5PiZL._SL160_.jpg" alt="Product Image" style='border:0;'/&gt;&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;tr&gt;&lt;td&gt;$39.99 and Under Juniors' Swimwear&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Expires May 10, 2014&lt;/td&gt;&lt;/tr&gt;&lt;/tr&gt;&lt;/table&gt;</description>
     <guid isPermaLink="false">http://promotions.amazon.com/gp/goldbox/5159412---5opWoFoLiIfWceLGIhXzm2wwCMk=</guid>
     <pubDate>Sat, 26 Apr 2014 07:00:00 +0000</pubDate>
  </item>

我想从'description'标签中提取'img src'字段。我是否在php中这样做。?

2 个答案:

答案 0 :(得分:0)

您可以将SimpleXML组合用于XML,并通过DOMDocument解析HTML。例如:

$xml_string = <<<XML
<item>
     <title>$39.99 and Under Juniors' Swimwear</title>
     <link>http://www.amazon.com/s/ref=xs_gb_rss_A1RFRNENBWTVO4/?rh=n:1036592,n:!2334084011,n:!2334146011,n:8021415011,p_6:ATVPDKIKX0DER&amp;bbn=8021415011&amp;ie=UTF8&amp;qid=1398271335&amp;rnid=15683531&amp;ccmID=380205&amp;tag=bugash-20</link>
     <description>&lt;table&gt;&lt;tr&gt;&lt;td&gt;&lt;a rel="nofollow" target="_blank" href="http://www.amazon.com/s/ref=xs_gb_rss_A1RFRNENBWTVO4/?rh=n:1036592,n:!2334084011,n:!2334146011,n:8021415011,p_6:ATVPDKIKX0DER&amp;bbn=8021415011&amp;ie=UTF8&amp;qid=1398271335&amp;rnid=15683531&amp;ccmID=380205&amp;tag=rssfeeds-20"&gt;&lt;img src="http://ecx.images-amazon.com/images/I/31kwUz5PiZL._SL160_.jpg" alt="Product Image" style='border:0;'/&gt;&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;tr&gt;&lt;td&gt;$39.99 and Under Juniors' Swimwear&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Expires May 10, 2014&lt;/td&gt;&lt;/tr&gt;&lt;/tr&gt;&lt;/table&gt;</description>
     <guid isPermaLink="false">http://promotions.amazon.com/gp/goldbox/5159412---5opWoFoLiIfWceLGIhXzm2wwCMk=</guid>
     <pubDate>Sat, 26 Apr 2014 07:00:00 +0000</pubDate>
  </item>
XML;

$xml = simplexml_load_string($xml_string); // or simplexml_load_file('path/to/file.xml');
$description = (string) $xml->description;

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($description);
libxml_clear_errors();
$src = $dom->getElementsByTagName('img')->item(0)->getAttribute('src');
echo $src; // http://ecx.images-amazon.com/images/I/31kwUz5PiZL._SL160_.jpg

答案 1 :(得分:0)

$xml = simplexml_load_string($xmlstring);
$imgData=$xml->getElementsByTagName("description")[0];
$imgString=$imgData->nodeValue;
$explodedFirst=explode("href=", $imgString);
$firstSplit=$explodedFirst[1];
$explodedLast=explode("Expires", $firstSplit);
$finalURL=$explodedLast[0];