我试图解析Project Gutenberg catalog.rdf和simplexml,如Basic SimpleXML usage:
$dom = new DOMDocument;
$dom->loadXML($etext_rdf);
if (!$dom) {
echo 'Error while parsing the document';
exit;
}
$etext= simplexml_import_dom($dom);
print_r($etext);
但$ etext始终为空。有什么提示吗?这是我正在测试的loadXML $ etext_rdf(source of rdf here):
<pgterms:etext rdf:ID="etext26783">
<dc:publisher>&pg;</dc:publisher>
<dc:title rdf:parseType="Literal">The Story of the Kearsarge and Alabama</dc:title>
<dc:creator rdf:parseType="Literal">Browne, A. K.</dc:creator>
<pgterms:friendlytitle rdf:parseType="Literal">The Story of the Kearsarge and Alabama by Browne</pgterms:friendlytitle>
<dc:language><dcterms:ISO639-2><rdf:value>en</rdf:value></dcterms:ISO639-2></dc:language>
<dc:subject>
<rdf:Bag>
<rdf:li><dcterms:LCSH><rdf:value>Kearsarge (Sloop)</rdf:value></dcterms:LCSH></rdf:li>
<rdf:li><dcterms:LCSH><rdf:value>Alabama (Screw sloop)</rdf:value></dcterms:LCSH></rdf:li>
<rdf:li><dcterms:LCSH><rdf:value>United States -- History -- Civil War, 1861-1865 -- Naval operations</rdf:value></dcterms:LCSH></rdf:li>
</rdf:Bag>
</dc:subject>
<dc:subject><dcterms:LCC><rdf:value>E456</rdf:value></dcterms:LCC></dc:subject>
<dc:created><dcterms:W3CDTF><rdf:value>2008-10-06</rdf:value></dcterms:W3CDTF></dc:created>
<pgterms:downloads><xsd:nonNegativeInteger><rdf:value>20</rdf:value></xsd:nonNegativeInteger></pgterms:downloads>
<dc:rights rdf:resource="&lic;" />
</pgterms:etext>
答案 0 :(得分:2)
你有没有取得进展,@ giorgio79?您的数据一直存在,但print_r用于查看变量,而不是对象。此版本的代码显示了如何将最终print_r中的对象转换为字符串,然后使用htmlentities使标记可读。输出与输入相同,因为到目前为止唯一的操作是简单的副本。祝你好运吧!
<?php
$etext_rdf = file_get_contents('http://www.gutenberg.org/ebooks/26783.rdf');
$dom = new DOMDocument;
$dom->loadXML($etext_rdf);
if (!$dom) {
echo 'Error while parsing the document';
exit;
}
$etext= simplexml_import_dom($dom);
// do your manipulations on the simpleXML object here..
echo '<pre>';
print_r( htmlentities($etext->saveXML()));
echo '</pre>';