从剪贴板中使用Python解析XML

时间:2017-01-13 12:54:12

标签: python

我正在尝试将ElementTree与来自Microsoft的示例数据一起使用,我刚刚将其复制并粘贴到字符串中(可能很天真)。

我在字符串中输入了所有XML数据,如下所示(这是一个截断的示例,但我使用了所有XML):

  data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

然后使用此代码:

import xml.etree.ElementTree as ET    
tree2 = ET.fromstring(data2)
print (tree2.find('author').text)

我得到了这个输出:

ParseError: XML or text declaration not at start of entity: line 2, column 0

但是,当我尝试一个简单的例子时,它可以工作:

data = '''
<p>
  <name>Fred</name>
</p>'''

tree = ET.fromstring(data)
print (tree.find('name').text)

输出:

Fred

这是因为我做了复制和粘贴,还是我的代码不正确?请告诉我这里我做错了什么。

5 个答案:

答案 0 :(得分:1)

data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>'''

不要以空行开头。

答案 1 :(得分:1)

import xml.etree.ElementTree as ET 

data2 = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>'''

data2 = data2.strip()
root = ET.fromstring(data2)

for node in root.iter():
    print node.tag, node.text

答案 2 :(得分:1)

1 - 第一排桅杆就像&#34; <?xml version="1.0"?>&#34;,首先你剥离(data2)

import xml.etree.ElementTree as ET  

data2 = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
   <book id="bk2">
      <author>Gambardella2, Matthew2</author>
   </book>
</catalog>
'''
data2 = data2.strip()

tree2 = ET.fromstring(data2)

for book in tree2.findall('book'):
     autor = book.find('author').text
     print (autor)

答案 3 :(得分:0)

首先,<?xml version...标记需要位于字符串的最开头。

您的数据在开头有一个换行符,使格式无效。

为:

data = '''
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
        etc 
        etc'''

assert data[0] == '\n'

好:

import xml.etree.ElementTree as ET

data = '''<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
   </book>
</catalog>'''


catalog = ET.fromstring(data)
for book in catalog.getchildren():
    for author in book.getchildren():
        print(author.text)

答案 4 :(得分:-1)

使用替换版本从data2中删除<?xml version="1.0"?>

应该有一种方法来指定这些东西,但我当时偶然发现,因为我正在解析对有效html看起来非常不同的网站。