如何使用php simplexml解析Project Gutenberg rdf?

时间:2013-01-08 10:22:01

标签: php xml dom simplexml

我试图解析Project Gutenberg catalog.rdf和simplexml,如Basic SimpleXML usage

$dom = new DOMDocument;
$dom->loadXML($etext_rdf);
if (!$dom) {
    echo 'Error while parsing the document';
    exit;
}

$etext= simplexml_import_dom($dom);
print_r($etext);

但$ etext始终为空。有什么提示吗?这是我正在测试的loadXML $ etext_rdf(source of rdf here):

<pgterms:etext rdf:ID="etext26783">
  <dc:publisher>&pg;</dc:publisher>
  <dc:title rdf:parseType="Literal">The Story of the Kearsarge and Alabama</dc:title>
  <dc:creator rdf:parseType="Literal">Browne, A. K.</dc:creator>
  <pgterms:friendlytitle rdf:parseType="Literal">The Story of the Kearsarge and Alabama by Browne</pgterms:friendlytitle>
  <dc:language><dcterms:ISO639-2><rdf:value>en</rdf:value></dcterms:ISO639-2></dc:language>
  <dc:subject>
    <rdf:Bag>
      <rdf:li><dcterms:LCSH><rdf:value>Kearsarge (Sloop)</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Alabama (Screw sloop)</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>United States -- History -- Civil War, 1861-1865 -- Naval operations</rdf:value></dcterms:LCSH></rdf:li>
    </rdf:Bag>
  </dc:subject>
  <dc:subject><dcterms:LCC><rdf:value>E456</rdf:value></dcterms:LCC></dc:subject>
  <dc:created><dcterms:W3CDTF><rdf:value>2008-10-06</rdf:value></dcterms:W3CDTF></dc:created>
  <pgterms:downloads><xsd:nonNegativeInteger><rdf:value>20</rdf:value></xsd:nonNegativeInteger></pgterms:downloads>
  <dc:rights rdf:resource="&lic;" />
</pgterms:etext>

1 个答案:

答案 0 :(得分:2)

你有没有取得进展,@ giorgio79?您的数据一直存在,但print_r用于查看变量,而不是对象。此版本的代码显示了如何将最终print_r中的对象转换为字符串,然后使用htmlentities使标记可读。输出与输入相同,因为到目前为止唯一的操作是简单的副本。祝你好运吧!

<?php
$etext_rdf = file_get_contents('http://www.gutenberg.org/ebooks/26783.rdf');
$dom = new DOMDocument;
$dom->loadXML($etext_rdf);
if (!$dom) {
    echo 'Error while parsing the document';
    exit;
}

$etext= simplexml_import_dom($dom);
// do your manipulations on the simpleXML object here..
echo '<pre>';
    print_r( htmlentities($etext->saveXML()));
echo '</pre>';