Question

我正在尝试将ElementTree与来自Microsoft的示例数据一起使用，我刚刚将其复制并粘贴到字符串中（可能很天真）。

我在字符串中输入了所有XML数据，如下所示（这是一个截断的示例，但我使用了所有XML）：

  data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

然后使用此代码：

import xml.etree.ElementTree as ET    
tree2 = ET.fromstring(data2)
print (tree2.find('author').text)

我得到了这个输出：

ParseError: XML or text declaration not at start of entity: line 2, column 0

但是，当我尝试一个简单的例子时，它可以工作：

data = '''
<p>
  <name>Fred</name>
</p>'''

tree = ET.fromstring(data)
print (tree.find('name').text)

输出：

Fred

这是因为我做了复制和粘贴，还是我的代码不正确？请告诉我这里我做错了什么。

Answer 1

data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>'''

不要以空行开头。

Answer 2

import xml.etree.ElementTree as ET 

data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>'''

data2 = data2.strip()
root = ET.fromstring(data2)

for node in root.iter():
    print node.tag, node.text

Answer 3

1 - 第一排桅杆就像＆＃34; <?xml version="1.0"?>＆＃34;，首先你剥离（data2）

import xml.etree.ElementTree as ET  

data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
   <book id="bk2">
      <author>Gambardella2, Matthew2</author>
   </book>
</catalog>
'''
data2 = data2.strip()

tree2 = ET.fromstring(data2)

for book in tree2.findall('book'):
     autor = book.find('author').text
     print (autor)

Answer 4

首先，<?xml version...标记需要位于字符串的最开头。

您的数据在开头有一个换行符，使格式无效。

为：

data = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

assert data[0] == '\n'

好：

import xml.etree.ElementTree as ET

data = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
</catalog>'''


catalog = ET.fromstring(data)
for book in catalog.getchildren():
    for author in book.getchildren():
        print(author.text)

Answer 5

使用替换版本从data2中删除<?xml version="1.0"?>。

应该有一种方法来指定这些东西，但我当时偶然发现，因为我正在解析对有效html看起来非常不同的网站。

从剪贴板中使用Python解析XML

5 个答案: