Question

我有一个包含字符“＆amp ;;”的XML文档，如下所示：

<dblp>
<article mdate="2011-12-29" key="tr/trier/MI96-15" publtype="informal publication">
<author>Manfred Laumen</author>
<title>Newton's Method for a Class of Optimal Shape Design Problems</title>
<journal>Universit&auml;t Trier, Mathematik/Informatik, Forschungsbericht</journal>
<volume>96-15</volume>
<year>1996</year>
</article>
</dblp>

我该如何解析它？

我的代码总是错误的：

import libxml2
doc = libxml2.parseFile('dblp.xml')

Answer 1

您需要一个定义ä的XML DTD，并且需要在正在解析的XML中引用（或包含）它。 This looks like the one you need。只需在<!DOCTYPE dblp SYSTEM "http://dblp.uni-trier.de/xml/dblp.dtd">声明之后立即在XML文件的顶部添加适当的声明，例如<?xml ...>。

如果您的文件中已经没有，那么让您的脚本添加它是微不足道的。

你也可以在你的文档中嵌入整个DTD：

<?xml version='1.0' encoding='utf8'?>
<!DOCTYPE dblp [
     <!-- the DTD linked above goes here -->
]>
<!-- the rest of your XML goes here -->

顺便说一下，这与Python几乎没有关系;您在任何语言中使用的任何XML解析器都会阻塞未在任何地方定义的实体。

如何使用包含字符“;＆amp;”的python parse xml doc

1 个答案: