我正在尝试将ElementTree与来自Microsoft的示例数据一起使用,我刚刚将其复制并粘贴到字符串中(可能很天真)。
我在字符串中输入了所有XML数据,如下所示(这是一个截断的示例,但我使用了所有XML):
data2 = '''
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
etc
etc'''
然后使用此代码:
import xml.etree.ElementTree as ET
tree2 = ET.fromstring(data2)
print (tree2.find('author').text)
我得到了这个输出:
ParseError: XML or text declaration not at start of entity: line 2, column 0
但是,当我尝试一个简单的例子时,它可以工作:
data = '''
<p>
<name>Fred</name>
</p>'''
tree = ET.fromstring(data)
print (tree.find('name').text)
输出:
Fred
这是因为我做了复制和粘贴,还是我的代码不正确?请告诉我这里我做错了什么。
答案 0 :(得分:1)
data2 = '''<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>'''
不要以空行开头。
答案 1 :(得分:1)
import xml.etree.ElementTree as ET
data2 = '''<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>'''
data2 = data2.strip()
root = ET.fromstring(data2)
for node in root.iter():
print node.tag, node.text
答案 2 :(得分:1)
1 - 第一排桅杆就像&#34; <?xml version="1.0"?>
&#34;,首先你剥离(data2)
import xml.etree.ElementTree as ET
data2 = '''
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
</book>
<book id="bk2">
<author>Gambardella2, Matthew2</author>
</book>
</catalog>
'''
data2 = data2.strip()
tree2 = ET.fromstring(data2)
for book in tree2.findall('book'):
autor = book.find('author').text
print (autor)
答案 3 :(得分:0)
首先,<?xml version...
标记需要位于字符串的最开头。
您的数据在开头有一个换行符,使格式无效。
为:
data = '''
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
etc
etc'''
assert data[0] == '\n'
好:
import xml.etree.ElementTree as ET
data = '''<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
</book>
</catalog>'''
catalog = ET.fromstring(data)
for book in catalog.getchildren():
for author in book.getchildren():
print(author.text)
答案 4 :(得分:-1)
使用替换版本从data2中删除<?xml version="1.0"?>
。
应该有一种方法来指定这些东西,但我当时偶然发现,因为我正在解析对有效html看起来非常不同的网站。