XML解析用php没有显示一些标签数据

时间:2012-09-18 06:30:50

标签: php rss xml-parsing simplexml

我正在尝试从链接解析rss feed。这是我的代码:

            $content = file_get_contents($this->feed);     
            print_r($content);   
            $rss = new SimpleXmlElement($content);
            print_r($rss);
            $rss_split = array();
           /* foreach ($rss->channel->item as $item) {
                $title = (string) $item->title; // Title
                $link = (string) $item->link; // Url Link
                $description = (string) $item->description; //Description               
                $rss_split[] = '<div><a href="' . $link . '" target="_blank" title="" >' . $title . ' </a><hr></div>';
            }*/

正在从此处下载完整的XML:http://devilsworkshop.org/feed/

以下是说明结构的摘录:

<item>
    <title>Windows 8 Appstore resembles a ghost town</title>
    <link>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/</link>
    <comments>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/#comments</comments>
    <pubDate>Tue, 18 Sep 2012 05:30:22 +0000</pubDate>
    <dc:creator>Vibin</dc:creator>
    <category><![CDATA[Analysis]]></category>
    <category><![CDATA[Windows 8]]></category>

    <guid isPermaLink="false">http://devilsworkshop.org/?p=62284</guid>
    <description><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it&#8217;s no more called Metro, but let&#8217;s keep it like that [...]</p><p>--
            This Post <a href="http://devilsworkshop.org/windows-appstore-resembles-ghost-town/">Windows 8 Appstore resembles a ghost town</a> is Published on <a href="http://devilsworkshop.org">Devils Workshop</a> .
        </p><h3>Related posts:</h3><ul>
            <li><a href='http://devilsworkshop.org/googles-new-look-resembles-yahoo-search/' rel='bookmark' title='Google&#8217;s new look resembles Yahoo Search'>Google&#8217;s new look resembles Yahoo Search</a></li>
        </ul>]]></description>
    <content:encoded><![CDATA[<p>Microsoft is all set to release Windows 8 for public in the coming weeks. Apparently, the biggest change in Windows 8 seems to be the Metro UI (I know it&#8217;s no more called Metro, but let&#8217;s keep it like that for simplicity) and apps.</p>
        <ul>
        <h2>Apps are less advanced</h2>
        <p>Metro is great on tablets, but on desktop, it looks like an OS with dumbed down apps. Take Skitch for example, it is an app for taking and editing screenshots and was previously a Mac-only app but recently came to Windows 8. Just compare these two apps and you&#8217;ll know what I meant.</p>
        <p>Here&#8217;s how Skitch looks in Windows 8:</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-full wp-image-62302" title="SkitchinWindows8" src="http://devilsworkshop.org/files/2012/09/SkitchinWindows8.png" alt="" width="740" height="570" /></a></p>
        <p>And now, this is the Mac version of Skitch:</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/SkitchinMac.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-full wp-image-62301" title="SkitchinMac" src="http://devilsworkshop.org/files/2012/09/SkitchinMac.png" alt="" width="671" height="575" /></a></p>
        <p>Another example can be Newsmix, an app which will let you read stuff that matters to you &#8211; in a Magazine layout. Apparently, this app is a fail for someone like me who subscribe to 50+ blogs.</p>
        <p><a href="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8.png"><img style=' display: block; margin-right: auto; margin-left: auto;'  class="aligncenter size-large wp-image-62305" title="NewsMix in Windows 8" src="http://devilsworkshop.org/files/2012/09/NewsmixinWindows8-1024x640.png" alt="news-mix-windows-8" width="620" height="387" /></a><br />
            Sure, it will be great on a Windows slate, but not really on a PC/laptop.</p>
        <li><a href='http://devilsworkshop.org/how-to-enable-hibernate-option-in-windows-vistawindows-7/' rel='bookmark' title='How to enable Hibernate Option in Windows Vista/Windows 7'>How to enable Hibernate Option in Windows Vista/Windows 7</a></li>
        <li><a href='http://devilsworkshop.org/windows-store/' rel='bookmark' title='Microsoft to Introduce Windows Store with Windows 8 Platform'>Microsoft to Introduce Windows Store with Windows 8 Platform</a></li>
        </ul>]]>
    </content:encoded>          
    <wfw:commentRss>http://devilsworkshop.org/windows-appstore-resembles-ghost-town/feed/</wfw:commentRss>
    <slash:comments>0</slash:comments>
</item>

当我打印$content时,它会显示content:encoded标记中的图像。 但是,打印$rss根本没有显示该标记,而且描述标记也显示SimpleXMLElement Object()

我想解析这两个标签。在哪里我做错了?

3 个答案:

答案 0 :(得分:2)

首先,print_r()不是预测SimpleXML对象行为方式的好选择,因为它们不是“普通”PHP对象。您可以尝试my simplexml_dump() function,其中列出了特定节点或节点列表的内容,子项和属性。

其次,content:encoded元素位于命名空间content中,因此您需要告诉SimpleXML访问该命名空间中的节点而不是默认使用->children() method。例如echo $item->children('content', true)->encoded;

答案 1 :(得分:1)

当然打印$rss没有显示数据..它显示了它的意图,因为它本身确实是SimpleXMLElement Object

但是,除此之外,据我所知,您的xml文档无法解析,因为它无效UTF-8。在将其复制到我的客户端并对其进行梳理后,我发现了一堆xA0x92个字符。

用相应的字符(空格和撇号)替换它们并保存文档时,它解析得很好。

这肯定是你的问题。

此问题的解决方案如下:

$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array('&nbsp;','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);

确保在声明simpleXML对象之前放置此代码:

$content = file_get_contents($this->feed);     
print_r($content);
$char_arr = array('/\xa0/','/\x92/','/\x96/');
$rep_arr = array('&nbsp;','\'','-');
$content = preg_replace($char_arr, $rep_arr, $content);
$rss = new SimpleXmlElement($content);

这应该可以解决你的问题;我自己测试过,它在我的最终工作。

答案 2 :(得分:0)

感谢IMSoP的回答,我直接跟随http://php.net/simplexml,其中找到并使用了xaviered_at gmail_dot_com的xmlObjToArr($ obj)函数来解决同样的问题。

对于那些仍在寻找内容之间标记内容的简单方法的人来说,这是一个简短而明显的脚本:编码

<?php

echo "<pre>";

$url = "http://devilsworkshop.org/feed/";
$rss = simplexml_load_file($url);

if($rss){

    $items = $rss->channel->item;

    foreach($items as $item){

        $title = $item->title;
        $image = $item->image;
        $link = $item->link;
        $published_on = $item->pubDate;
        $description = $item->description;

        // bringing in to array <content:encoded> items from SimpleXMLElement Object()
        $content = xmlObjToArr($item->children('content', true)->encoded);


        echo "

        title: $title
        image: $image
        link: $link
        published on: $published_on
        description: $description
        content: 
        ";

        print_r($content);

    }
}


function xmlObjToArr($obj) {
        $namespace = $obj->getDocNamespaces(true);
        $namespace[NULL] = NULL;

        $children = array();
        $attributes = array();
        $name = strtolower((string)$obj->getName());

        $text = trim((string)$obj);
        if( strlen($text) <= 0 ) {
            $text = NULL;
        }

        // get info for all namespaces
        if(is_object($obj)) {
            foreach( $namespace as $ns=>$nsUrl ) {
                // atributes
                $objAttributes = $obj->attributes($ns, true);
                foreach( $objAttributes as $attributeName => $attributeValue ) {
                    $attribName = strtolower(trim((string)$attributeName));
                    $attribVal = trim((string)$attributeValue);
                    if (!empty($ns)) {
                        $attribName = $ns . ':' . $attribName;
                    }
                    $attributes[$attribName] = $attribVal;
                }

                // children
                $objChildren = $obj->children($ns, true);
                foreach( $objChildren as $childName=>$child ) {
                    $childName = strtolower((string)$childName);
                    if( !empty($ns) ) {
                        $childName = $ns.':'.$childName;
                    }
                    $children[$childName][] = xmlObjToArr($child);
                }
            }
        }

        return array(
            'name'=>$name,
            'text'=>$text,
            'attributes'=>$attributes,
            'children'=>$children
        );
    }


?>