我正在尝试解析YouTube的前15个视频Feed。我试图解析的Feed的摘录如下所示:
<entry>
<title>The Title</title>
<link href="http://example.com" />
<media:thumbnail url="http://example.com/image.png" />
<media:description>The Description</media:description>
<media:statistics views="123456" />
<pubDate>29/01/2017</pubDate>
</entry>
我无法捕获使用以<media:
开头的代码的任何值。我使用以下代码来解析数据;评论的行是那些不能工作的行。
foreach ($xml->entry as $val) {
echo "<item>".PHP_EOL;
echo "<title>".$val->title."</title>".PHP_EOL;
echo "<link>".$val->link["href"]."</link>".PHP_EOL;
//echo "<image>".$val->media:thumbnail["url"]."</image>".PHP_EOL;
//echo "<description>".$val->media:description."</description>".PHP_EOL;
//echo "<views>".$val->media:statistics["views"]."</views>".PHP_EOL;
echo "<pubDate>".$val->published."</pubDate>".PHP_EOL;
echo "</item>".PHP_EOL;
}
如何在不设置命名空间的情况下获取这些标记的值。在var_dump
上执行$xml->entry
甚至不显示命名空间元素。是否有更好的内置函数将XML转换为数组?
答案 0 :(得分:0)
从code provided by IMSoP得到答案。我最终使用的PHP片段是根据前面提到的链接改编的,使用类似于OP的XML:
foreach ($xml->children(NS_ATOM)->entry as $entry) {
echo "<item>".PHP_EOL;
echo "<title>".$entry->title."</title>".PHP_EOL;
echo "<link>".$entry->link->attributes(null)->href."</link>".PHP_EOL;
echo "<image>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->thumbnail->attributes(null)->url."</image>".PHP_EOL;
echo "<description>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->description."</description>".PHP_EOL;
echo "<guid>".$entry->children(NS_YT)->videoId."</guid>".PHP_EOL;
echo "<views>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->community->children(NS_MEDIA)->statistics->attributes(null)->views."</views>".PHP_EOL;
echo "<pubDate>".$entry->published."</pubDate>".PHP_EOL;
echo "</item>".PHP_EOL;
}
希望这可以在将来帮助某人。这是我到目前为止遇到的XML命名空间解析最简单的例子。
答案 1 :(得分:0)
考虑XSLT,XPath的兄弟,因为你实际上是在转换原始XML,而不是真正解析选择值。使用XSLT,您不需要foreach
循环,并且可以充分处理名称空间。
实际上如下所示,使用包含在SimpleXML
根目录中的已发布XML,XSLT是上述方法中最快的(XPath
查询和<feed ...>
评估):
简单XML (来自@IMSoP)
$time_start = microtime(true);
$xml = file_get_contents('YoutubeFeed.xml');
$document = new SimpleXMLElement($xml);
define('NS_ATOM', 'http://www.w3.org/2005/Atom');
define('NS_MEDIA', 'http://search.yahoo.com/mrss/');
foreach ($document->children(NS_ATOM)->entry as $entry) {
echo "<item>".PHP_EOL;
echo "<title>".$entry->title."</title>".PHP_EOL;
echo "<link>".$entry->link->attributes(null)->href."</link>".PHP_EOL;
echo "<image>".$entry->children(NS_MEDIA)->thumbnail->attributes()->url."</image>".PHP_EOL;
echo "<description>".$entry->children(NS_MEDIA)->description."</description>".PHP_EOL;
echo "<guid>".$entry->children(NS_MEDIA)->guid."</guid>".PHP_EOL;
echo "<views>".$entry->children(NS_MEDIA)->statistics->attributes()->views."</views>".PHP_EOL;
echo "<pubDate>".$entry->published."</pubDate>".PHP_EOL;
echo "</item>".PHP_EOL;
}
时序
echo "SimpleXML: " . (microtime(true) - $time_start) ."\n";
# SimpleXML: 0.0014688968658447
XPATH (来自@ThW)
$time_start = microtime(true);
$xml = file_get_contents('YoutubeFeed.xml');
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('atom', 'http://www.w3.org/2005/Atom');
$xpath->registerNamespace('media', 'http://search.yahoo.com/mrss/');
foreach ($xpath->evaluate('//atom:entry') as $entry) {
echo "<item>".PHP_EOL;
echo "<title>". $xpath->evaluate('string(atom:title)', $entry)."</title>".PHP_EOL;
echo "<link>". $xpath->evaluate('string(atom:link/@href)', $entry)."</link>".PHP_EOL;
echo "<image>". $xpath->evaluate('string(media:thumbnail/@url)', $entry)."</image>".PHP_EOL;
echo "<description>". $xpath->evaluate('string(media:description)', $entry)."</description>".PHP_EOL;
echo "<guid>". $xpath->evaluate('string(media:guid)', $entry)."</description>".PHP_EOL;
echo "<views>".$xpath->evaluate('string(media:statistics/@views)', $entry)."</guid>".PHP_EOL;
echo "<pubDate>". $xpath->evaluate('string(atom:pubdate)', $entry)."</views>".PHP_EOL;
echo "</item>".PHP_EOL;
}
时序
echo "XPATH: " . (microtime(true) - $time_start) ."\n";
# XPATH: 0.0012829303741455
<强> XSLT 强>
$time_start = microtime(true);
$xml = file_get_contents('YoutubeFeed.xml');
$document = new DOMDocument();
$document->loadXml($xml);
$xslstr = '<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"
exclude-result-prefixes="atom media">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="feed">
<xsl:apply-templates select="atom:entry"/>
</xsl:template>
<xsl:template match="atom:entry">
<item>
<title><xsl:value-of select="atom:title"/></title>
<link><xsl:value-of select="atom:link/@href"/></link>
<image><xsl:value-of select="atom:thumbnail/@url"/></image>
<description><xsl:value-of select="media:description"/></description>
<guid><xsl:value-of select="media:guid"/></guid>
<views><xsl:value-of select="media:statistics/@views"/></views>
<pubDate><xsl:value-of select="atom:pubdate"/></pubDate>
</item>
</xsl:template>
</xsl:stylesheet>';
$xsl = new DOMDocument;
$xsl->loadXML($xslstr);
// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
// Transform XML source
$newXML = $proc->transformToXML($document);
// Echo string output
echo $newXML;
时序
echo "XSLT: " . (microtime(true) - $time_start) ."\n";
# XSLT: 0.00098896026611328
即使有更多<entry>
个节点,将标签和子节点复制到500行,XSLT也会更好地扩展。以下单位是秒:
# SimpleXML: 0.62154388427734
# XPATH: 0.68382000923157
# XSLT: 0.011976957321167