试图从RSS源解析图像和文本

时间:2014-04-01 19:45:15

标签: php html xpath

这是此处主题的延续:Trying to Parse Only the Images from an RSS Feed

这次我要解析RSS提要中的图像和某些项目。 RSS提要的采样如下所示:

 <channel>
 <atom:link href="http://mywebsite.com/rss" rel="self" type="application/rss+xml" />

 <item>
 <title>Article One</title>
 <guid isPermaLink="true">http://mywebsite.com/details/e8c5106</guid>
 <link>http://mywebsite.com/geturl/e8c5106</link>
 <comments>http://mywebsite.com/details/e8c5106#comments</comments>     
 <pubDate>Wed, 09 Jan 2013 02:59:45 -0500</pubDate> 
 <category>Category 1</category>    
 <description>
      <![CDATA[<div>
      <img src="http://mywebsite.com/myimages/1521197-main.jpg" width="120" border="0"  />  
      <ul><li>Poster: someone's name;</li>
      <li>PostDate: Tue, 08 Jan 2013 21:49:35 -0500</li>
      <li>Rating: 5</li>
      <li>Summary:Lorem ipsum dolor </li></ul></div><div style="clear:both;">]]>
      </description>
 </item> 
 <item>..

我在下面的代码中尝试解析图像和文本:

$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1');

$descriptions = $xml->xpath('//item/description');
$mytitle= $xml->xpath('//item/title');

foreach ( $descriptions as $description_node ) {
   // The description may not be valid XML, so use a more forgiving HTML parser mode
   $description_dom = new DOMDocument();
   $description_dom->loadHTML( (string)$description_node );

   // Switch back to SimpleXML for readability
   $description_sxml = simplexml_import_dom( $description_dom );

   // Find all images, and extract their 'src' param
   $imgs = $description_sxml->xpath('//img');
   foreach($imgs as $image) {
        echo "<img id=poster class=poster src={$image['src']}> {$mytitle}";
        }
    }

以上代码精美地提取图像....但是,当我尝试代码的最后一行时,它不会提取$ mytitle(这将是&#34; Article One&#34;)标记。这应该从RSS提要中的所有项目中提取。

任何人都可以帮我解决这个问题。

非常感谢,

Hernando的

1 个答案:

答案 0 :(得分:1)

xpath()总是返回一个数组(参见http://www.php.net/manual/en/simplexmlelement.xpath.php),即使结果只有一个元素。如果您知道您将需要一个元素,则只需使用$mytitle[0]

您必须迭代每个<item/>元素,否则您无法知道哪个描述和哪个标题属于一起。所以以下内容应该有效:

$xml = simplexml_load_file('test.xml');

$items = $xml->xpath('//item');

foreach ( $items as $item) {
  $descriptions = $item->description;
  $mytitle = $item->title;
  foreach ( $descriptions as $description_node ) {
     // The description may not be valid XML, so use a more forgiving HTML parser mode
     $description_dom = new DOMDocument();
     $description_dom->loadHTML( (string)$description_node );

     // Switch back to SimpleXML for readability
     $description_sxml = simplexml_import_dom( $description_dom );

     // Find all images, and extract their 'src' param
     $imgs = $description_sxml->xpath('//img');
     foreach($imgs as $image) {
          echo "<img id=\"poster\" class=\"poster\" src=\"{$image['src']}\"> {$mytitle}";
          }
      }
}

顺便说一句,我还在你的<img/>元素中添加了“”。我想你想要这个,因为它看起来非常像XML / HTML。