使用ElementTree进行Python XML解析返回None

时间:2015-07-24 14:35:12

标签: python xml elementtree

我正在尝试使用Python中的ElementTree解析此xml字符串,

以字符串形式存储的数据

spawn scp "root@ch1.local:/root/*.csv" ./ch1_current_date.csv
spawn scp "root@ch2.local:/root/*.csv" ./ch2_current_date.csv
:
:
spawn scp "root@ch20.local:/root/*.csv" ./ch20_current_date.csv

我用来将此字符串解析为xml,

的代码
xml = '''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
    <RollNumber>1</RollNumber>
    <Name>Abel</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>abel@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>2</RollNumber>
    <Name>Joseph</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>joseph@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>3</RollNumber>
    <Name>Mike</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>mike@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
</SearchResults>'''

from xml.etree import ElementTree xml = ElementTree.fromstring(xml) results = xml.findall('Student') for students in results: for student in students: print student.get('Name') 将结果打印为Elements,

print results
在for循环中,[<Element 'Student' at 0x7feb615b4ad0>, <Element 'Student' at 0x7feb615b4c50>, <Element 'Student' at 0x7feb615b4e10>] 打印出相同的内容,

print students

无论如何,当我尝试使用<Element 'Student' at 0x7fd722d88ad0> <Element 'Student' at 0x7fd722d88c50> <Element 'Student' at 0x7fd722d88e10> 获取学生的姓名时,程序将返回无。

我要做的是从每个标签的xml中提取值并构造一个字典。

2 个答案:

答案 0 :(得分:4)

你有一个双循环:

for students in results:
    for student in students:
        print student.get('Name')

students 一个<Student>元素。通过迭代,您可以获得中包含的单个元素元素。那些包含的元素(<RollNumber><Name>等)没有Name属性。

.get()方法仅访问属性,但您似乎想要获取<Name>元素。请改用.find()或XPath表达式:

for student in results:
    name = student.find('Name')
    if name is not None:
        print name.text

for student_name in xml.findall('.//Student/Name'):
    print name.text

答案 1 :(得分:2)

如果您不熟悉XML处理:

  • lxml是一个快速而强大的库,用于在python中与XML进行交互。标准库没有完整的xpath支持。
  • xpath是一种用于检查XML文档的查询语言,它具有陡峭的学习曲线,但在StackOverflow上很容易获得帮助。 xpath非常有用,以至于我在使用API​​时开始将JSON转换为XML,以便我可以编写xpath个查询而不是疯狂的嵌套字典解除引用。
from lxml import etree
from pprint import pprint

doc = etree.XML('''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
    <RollNumber>1</RollNumber>
    <Name>Abel</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>abel@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>2</RollNumber>
    <Name>Joseph</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>joseph@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>3</RollNumber>
    <Name>Mike</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>mike@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
</SearchResults>''')

def first(seq,default=None):
  for item in seq:
    return item
  return default

def simple_children_to_dict(element):
  result = {}
  for child in element:
    result[child.tag] = child.text
  return result

def get_by_rollnumber(number,search_results):
  student_element = first(search_results.xpath('Student[./RollNumber=$number]',number=number))
  if student_element is None:
    raise Exception("Student Number {0} not found".format(number))
  return simple_children_to_dict(student_element)  

def get_all_students(search_results):
  students = []
  for student_element in doc.xpath('Student'):
    students.append(simple_children_to_dict(student_element))
  return students

然后:

>>> pprint(get_by_rollnumber(2,doc))
{'Email': 'joseph@hisschool.edu',
 'Grade': '7',
 'Name': 'Joseph',
 'PhoneNumber': 'Not Included',
 'RollNumber': '2'}
>>>
>>> pprint(get_all_students(doc))
[{'Email': 'abel@hisschool.edu',
  'Grade': '7',
  'Name': 'Abel',
  'PhoneNumber': 'Not Included',
  'RollNumber': '1'},
 {'Email': 'joseph@hisschool.edu',
  'Grade': '7',
  'Name': 'Joseph',
  'PhoneNumber': 'Not Included',
  'RollNumber': '2'},
 {'Email': 'mike@hisschool.edu',
  'Grade': '7',
  'Name': 'Mike',
  'PhoneNumber': 'Not Included',
  'RollNumber': '3'}]

细微之处:

  • xpath查询通常返回结果集,因为大多数查询可能有多个匹配项。因此使用帮助器first函数。