我有一个看起来像这样的文件:
<ExternalPage about="http://animation.about.com/">
<d:Title>About.com: Animation Guide</d:Title>
<d:Description>Keep up with developments in online animation for all skill levels. Download tools, and seek inspiration from online work.</d:Description>
<topic>Top/Arts/Animation</topic>
</ExternalPage>
<ExternalPage about="http://www.toonhound.com/">
<d:Title>Toonhound</d:Title>
<d:Description>British cartoon, animation and comic strip creations - links, reviews and news from the UK.</d:Description>
<topic>Top/Arts/Animation</topic>
</ExternalPage>
等
我正在尝试获取“关于”网址,以及嵌套标题和说明。我已经尝试了以下代码,但我得到的只是一堆破折号......
$reader = new XMLReader();
if (!$reader->open("dbpedia/links/xml.xml")) {
die("Failed to open 'xml.xml'");
}
$num=0;
while($reader->read() && $num<200) {
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'ExternalPage') {
$url = $reader->getAttribute('about');
while ($xml->nodeType !== XMLReader::END_ELEMENT ){
$reader->read();
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'd:Title') {
$title=$xmlReader->value;
}
elseif ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'd:Description') {
$desc=$xmlReader->value;
}
}
}
$num++;echo $url."-".$title."-".$desc."<br />";
}
$reader->close();
我是xmlreader的新手,所以如果有人能弄清楚我做错了什么,我会很感激。
注意:我正在使用xmlreader,因为文件很大(数百万行)。
编辑:文件的开头如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<RDF xmlns:r="http://www.w3.org/TR/RDF/" xmlns:d="http://purl.org/dc/elements/1.0/" xmlns="http://dmoz.org/rdf/">
<!-- Generated at 2013-02-10 00:03:45 EST from DMOZ 2.0 -->
<Topic r:id="">
<catid>1</catid>
</Topic>
<Topic r:id="Top/Arts">
<catid>381773</catid>
</Topic>
<Topic r:id="Top/Arts/Animation">
<catid>423945</catid>
<link1 r:resource="http://www.awn.com/"></link1>
<link r:resource="http://animation.about.com/"></link>
<link r:resource="http://www.toonhound.com/"></link>
<link r:resource="http://enculturation.gmu.edu/2_1/pisters.html"></link>
<link r:resource="http://www.digitalmediafx.com/Features/animationhistory.html"></link>
<link r:resource="http://www.spark-online.com/august00/media/romano.html"></link>
<link r:resource="http://www.animated-divots.net/"></link>
</Topic>
<ExternalPage about="http://www.awn.com/">
<d:Title>Animation World Network</d:Title>
<d:Description>Provides information resources to the international animation community. Features include searchable database archives, monthly magazine, web animation guide, the Animation Village, discussion forums and other useful resources.</d:Description>
<priority>1</priority>
<topic>Top/Arts/Animation</topic>
</ExternalPage>
等
答案 0 :(得分:3)
需要时间和适当的调试来提供纯XMLReader代码。同时尝试这种混合方法:
$xmlR = new XMLReader;
$xmlR->open('dbpedia/links/xml.xml');
//Skip until <ExternalPage> node
while ($xmlR->read() && $xmlR->name !== 'ExternalPage');
$loadedNS_f = false;
while ($xmlR->name === 'ExternalPage')
{
//Read the entire parent tag with children
$sxmlNode = new SimpleXMLElement($xmlR->readOuterXML());
//collect all namespaces in node recursively once; assuming all nodes are similar
if (!$loadedNS_f) {
$tagNS = $sxmlNode->getNamespaces(true);
$loadedNS_f = true;
}
$URL = (string) $sxmlNode['about'];
$dNS = $sxmlNode->children($tagNS['d']);
$Title = (string) $dNS->Title;
$Desc = (string) $dNS->Description;
$Topic = (string)$sxmlNode->topic;
var_dump($URL, $Title, $Desc, $Topic);
// Jump to next <ExternalPage> tag
$xmlR->next('ExternalPage');
}
$xmlR->close();
答案 1 :(得分:1)
它不适合你的原因是因为你只读取d:Title
元素的起始标记并且没有任何值:
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'd:Title') {
$title=$xmlReader->value;
}
您可能想要获取该DOM元素的nodeValue,但这不是$xmlReader->value
将返回的内容。了解这一点有多种方法可以解决这个问题:
展开节点(XMLReader::expand()
)并获取nodeValue
(快速示例):
$title = $reader->expand()->nodeValue;
处理您自己的所有XMLReader::TEXT (3)
和/或XMLReader::CDATA (4)
子节点(通过查看XMLReader::$depth
来确定节点是否为子节点。)
在任何情况下,为了简化代码,您可以考虑直接提供所需的内容,例如通过自己创建一组函数或扩展XMLReader类:
class MyXMLReader extends XMLReader
{
public function readToNextElement()
{
while (
$result = $this->read()
and $this->nodeType !== self::ELEMENT
) ;
return $result;
}
public function readToNext($localname)
{
while (
$result = $this->readToNextElement()
and $this->localName !== $localname
) ;
return $result;
}
public function readToNextChildElement($depth)
{
// if the current element is the parent and
// empty there are no children to go into
if ($this->depth == $depth && $this->isEmptyElement) {
return false;
}
while ($result = $this->read()) {
if ($this->depth <= $depth) return false;
if ($this->nodeType === self::ELEMENT) break;
}
return $result;
}
public function getNodeValue($default = NULL)
{
$node = $this->expand();
return $node ? $node->nodeValue : $default;
}
}
然后您可以使用此扩展类进行处理:
$reader = new MyXMLReader();
$reader->open($uri);
$num = 0;
while ($reader->readToNext('ExternalPage') and $num < 200) {
$url = $reader->getAttribute('about');
$depth = $reader->depth;
$title = $desc = '';
while ($reader->readToNextChildElement($depth)) {
switch ($reader->localName) {
case 'Title':
$title = $reader->getNodeValue();
break;
case 'Description':
$desc = trim($reader->getNodeValue());
break;
}
}
$num++;
echo "#", $num, ": ", $url, " - ", $title, " - ", $desc, "<br />\n";
}
正如您所看到的,这极大地使您的代码更具可读性。如果你读完这些,你也不必每次都在乎。
答案 2 :(得分:0)
以下是获取该属性的另一种方法:
$string = file_get_contents($filename);
$xml = new SimpleXMLElement($string);
$result = $xml->xpath('/RDF/ExternalPage[*]/@about');
var_dump($result);