Question

我正在尝试解析YouTube的前15个视频Feed。我试图解析的Feed的摘录如下所示：

<entry>
    <title>The Title</title>
    <link href="http://example.com" />
    <media:thumbnail url="http://example.com/image.png" />
    <media:description>The Description</media:description>
    <media:statistics views="123456" />
    <pubDate>29/01/2017</pubDate>
</entry>

我无法捕获使用以<media:开头的代码的任何值。我使用以下代码来解析数据;评论的行是那些不能工作的行。

foreach ($xml->entry as $val) {
    echo "<item>".PHP_EOL;
    echo "<title>".$val->title."</title>".PHP_EOL;
    echo "<link>".$val->link["href"]."</link>".PHP_EOL;
    //echo "<image>".$val->media:thumbnail["url"]."</image>".PHP_EOL;
    //echo "<description>".$val->media:description."</description>".PHP_EOL;
    //echo "<views>".$val->media:statistics["views"]."</views>".PHP_EOL;
    echo "<pubDate>".$val->published."</pubDate>".PHP_EOL;
    echo "</item>".PHP_EOL;
}

如何在不设置命名空间的情况下获取这些标记的值。在var_dump上执行$xml->entry甚至不显示命名空间元素。是否有更好的内置函数将XML转换为数组？

Answer 1

从code provided by IMSoP得到答案。我最终使用的PHP片段是根据前面提到的链接改编的，使用类似于OP的XML：

foreach ($xml->children(NS_ATOM)->entry as $entry) {
    echo "<item>".PHP_EOL;
    echo "<title>".$entry->title."</title>".PHP_EOL;
    echo "<link>".$entry->link->attributes(null)->href."</link>".PHP_EOL;
    echo "<image>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->thumbnail->attributes(null)->url."</image>".PHP_EOL;
    echo "<description>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->description."</description>".PHP_EOL;
    echo "<guid>".$entry->children(NS_YT)->videoId."</guid>".PHP_EOL;
    echo "<views>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->community->children(NS_MEDIA)->statistics->attributes(null)->views."</views>".PHP_EOL;
    echo "<pubDate>".$entry->published."</pubDate>".PHP_EOL;
    echo "</item>".PHP_EOL;
}

希望这可以在将来帮助某人。这是我到目前为止遇到的XML命名空间解析最简单的例子。

Answer 2

考虑XSLT，XPath的兄弟，因为你实际上是在转换原始XML，而不是真正解析选择值。使用XSLT，您不需要foreach循环，并且可以充分处理名称空间。

实际上如下所示，使用包含在SimpleXML根目录中的已发布XML，XSLT是上述方法中最快的（XPath查询和<feed ...>评估）：

简单XML （来自@IMSoP）

$time_start = microtime(true);

$xml = file_get_contents('YoutubeFeed.xml');
$document = new SimpleXMLElement($xml);
define('NS_ATOM', 'http://www.w3.org/2005/Atom');
define('NS_MEDIA', 'http://search.yahoo.com/mrss/');

foreach ($document->children(NS_ATOM)->entry as $entry) {
    echo "<item>".PHP_EOL;
    echo "<title>".$entry->title."</title>".PHP_EOL;
    echo "<link>".$entry->link->attributes(null)->href."</link>".PHP_EOL;
    echo "<image>".$entry->children(NS_MEDIA)->thumbnail->attributes()->url."</image>".PHP_EOL;
    echo "<description>".$entry->children(NS_MEDIA)->description."</description>".PHP_EOL;
    echo "<guid>".$entry->children(NS_MEDIA)->guid."</guid>".PHP_EOL;
    echo "<views>".$entry->children(NS_MEDIA)->statistics->attributes()->views."</views>".PHP_EOL;
    echo "<pubDate>".$entry->published."</pubDate>".PHP_EOL;
    echo "</item>".PHP_EOL;
}

时序

echo "SimpleXML: " . (microtime(true) - $time_start) ."\n";
# SimpleXML: 0.0014688968658447

XPATH （来自@ThW）

$time_start = microtime(true);

$xml = file_get_contents('YoutubeFeed.xml');
$document = new DOMDocument();
$document->loadXml($xml);

$xpath = new DOMXpath($document);
$xpath->registerNamespace('atom', 'http://www.w3.org/2005/Atom');
$xpath->registerNamespace('media', 'http://search.yahoo.com/mrss/');

foreach ($xpath->evaluate('//atom:entry') as $entry) {
   echo "<item>".PHP_EOL;
   echo "<title>". $xpath->evaluate('string(atom:title)', $entry)."</title>".PHP_EOL;
   echo "<link>". $xpath->evaluate('string(atom:link/@href)', $entry)."</link>".PHP_EOL;
   echo "<image>". $xpath->evaluate('string(media:thumbnail/@url)', $entry)."</image>".PHP_EOL;
   echo "<description>". $xpath->evaluate('string(media:description)', $entry)."</description>".PHP_EOL;
   echo "<guid>". $xpath->evaluate('string(media:guid)', $entry)."</description>".PHP_EOL;
   echo "<views>".$xpath->evaluate('string(media:statistics/@views)', $entry)."</guid>".PHP_EOL;
   echo "<pubDate>". $xpath->evaluate('string(atom:pubdate)', $entry)."</views>".PHP_EOL;
   echo "</item>".PHP_EOL;
}

时序

echo "XPATH: " . (microtime(true) - $time_start) ."\n";
# XPATH: 0.0012829303741455

<强> XSLT

$time_start = microtime(true);

$xml = file_get_contents('YoutubeFeed.xml');
$document = new DOMDocument();
$document->loadXml($xml);

$xslstr = '<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
                xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"
                exclude-result-prefixes="atom media">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

   <xsl:template match="feed">
    <xsl:apply-templates select="atom:entry"/>
   </xsl:template>

   <xsl:template match="atom:entry">
      <item>
         <title><xsl:value-of select="atom:title"/></title>
         <link><xsl:value-of select="atom:link/@href"/></link>
         <image><xsl:value-of select="atom:thumbnail/@url"/></image>
         <description><xsl:value-of select="media:description"/></description>
         <guid><xsl:value-of select="media:guid"/></guid>
         <views><xsl:value-of select="media:statistics/@views"/></views>
         <pubDate><xsl:value-of select="atom:pubdate"/></pubDate>
      </item>
  </xsl:template>
</xsl:stylesheet>';

$xsl = new DOMDocument;
$xsl->loadXML($xslstr);

// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); 

// Transform XML source
$newXML = $proc->transformToXML($document);

// Echo string output
echo $newXML;

时序

echo "XSLT: " . (microtime(true) - $time_start) ."\n";
# XSLT: 0.00098896026611328

即使有更多<entry>个节点，将标签和子节点复制到500行，XSLT也会更好地扩展。以下单位是秒：

# SimpleXML: 0.62154388427734

# XPATH: 0.68382000923157

# XSLT: 0.011976957321167

如何使用SimpleXML轻松地在PHP中使用命名空间解析XML文档？

2 个答案: