我正在尝试使用Python中的ElementTree解析此xml字符串,
以字符串形式存储的数据
spawn scp "root@ch1.local:/root/*.csv" ./ch1_current_date.csv
spawn scp "root@ch2.local:/root/*.csv" ./ch2_current_date.csv
:
:
spawn scp "root@ch20.local:/root/*.csv" ./ch20_current_date.csv
我用来将此字符串解析为xml,
的代码xml = '''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
<RollNumber>1</RollNumber>
<Name>Abel</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>abel@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>2</RollNumber>
<Name>Joseph</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>joseph@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>3</RollNumber>
<Name>Mike</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>mike@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
</SearchResults>'''
from xml.etree import ElementTree
xml = ElementTree.fromstring(xml)
results = xml.findall('Student')
for students in results:
for student in students:
print student.get('Name')
将结果打印为Elements,
print results
在for循环中,[<Element 'Student' at 0x7feb615b4ad0>, <Element 'Student' at 0x7feb615b4c50>, <Element 'Student' at 0x7feb615b4e10>]
打印出相同的内容,
print students
无论如何,当我尝试使用<Element 'Student' at 0x7fd722d88ad0>
<Element 'Student' at 0x7fd722d88c50>
<Element 'Student' at 0x7fd722d88e10>
获取学生的姓名时,程序将返回无。
我要做的是从每个标签的xml中提取值并构造一个字典。
答案 0 :(得分:4)
你有一个双循环:
for students in results:
for student in students:
print student.get('Name')
students
一个<Student>
元素。通过迭代,您可以获得中包含的单个元素元素。那些包含的元素(<RollNumber>
,<Name>
等)没有Name
属性。
.get()
方法仅访问属性,但您似乎想要获取<Name>
元素。请改用.find()
或XPath表达式:
for student in results:
name = student.find('Name')
if name is not None:
print name.text
或
for student_name in xml.findall('.//Student/Name'):
print name.text
答案 1 :(得分:2)
如果您不熟悉XML处理:
xpath
支持。xpath
非常有用,以至于我在使用API时开始将JSON转换为XML,以便我可以编写xpath
个查询而不是疯狂的嵌套字典解除引用。from lxml import etree
from pprint import pprint
doc = etree.XML('''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
<RollNumber>1</RollNumber>
<Name>Abel</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>abel@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>2</RollNumber>
<Name>Joseph</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>joseph@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
<Student>
<RollNumber>3</RollNumber>
<Name>Mike</Name>
<PhoneNumber>Not Included</PhoneNumber>
<Email>mike@hisschool.edu</Email>
<Grade>7</Grade>
</Student>
</SearchResults>''')
def first(seq,default=None):
for item in seq:
return item
return default
def simple_children_to_dict(element):
result = {}
for child in element:
result[child.tag] = child.text
return result
def get_by_rollnumber(number,search_results):
student_element = first(search_results.xpath('Student[./RollNumber=$number]',number=number))
if student_element is None:
raise Exception("Student Number {0} not found".format(number))
return simple_children_to_dict(student_element)
def get_all_students(search_results):
students = []
for student_element in doc.xpath('Student'):
students.append(simple_children_to_dict(student_element))
return students
然后:
>>> pprint(get_by_rollnumber(2,doc))
{'Email': 'joseph@hisschool.edu',
'Grade': '7',
'Name': 'Joseph',
'PhoneNumber': 'Not Included',
'RollNumber': '2'}
>>>
>>> pprint(get_all_students(doc))
[{'Email': 'abel@hisschool.edu',
'Grade': '7',
'Name': 'Abel',
'PhoneNumber': 'Not Included',
'RollNumber': '1'},
{'Email': 'joseph@hisschool.edu',
'Grade': '7',
'Name': 'Joseph',
'PhoneNumber': 'Not Included',
'RollNumber': '2'},
{'Email': 'mike@hisschool.edu',
'Grade': '7',
'Name': 'Mike',
'PhoneNumber': 'Not Included',
'RollNumber': '3'}]
细微之处:
xpath
查询通常返回结果集,因为大多数查询可能有多个匹配项。因此使用帮助器first
函数。