我使用lxml.objectify解析Python3中的XML文件:
<root>
<object_header></object_header>
<object_details></object_details>
<object_details></object_details>
<object_header></object_header>
<object_details></object_details>
<object_header></object_header>
</root>
请注意,有时对象没有属性。
我目前正在解析这个问题(有效但不优雅)的方法如下:
from lxml import objectify, etree
root = objectify.parse(xmlFile).getroot()
elems = [el for el in root.iterchildren()]
# data is list of objects
data = []
# Have to instantiate outside of for loop in case last object has not details.
objectDetails = ''
# Don't store first object right away.
firstObject = True
# Iterate through each XML element.
for elem in elems:
if elem.tag == 'object_header':
# Remember object header info.
object = storeHeaderInfo(objectDetails)
# Skip saving if first object, need to grab object details.
if firstObject == True:
# Don't skip again, in case object has no details.
firstObject = False
continue
# Save object, already grabbed object details.
data.append(object)
else:
# Process object details in <object_details> tag.
objectDetails += etree.tostring(elem)
# Save last object.
object = storeHeaderInfo(objectDetails)
data.append(object)
我不喜欢的是我如何编码存储对象两次。一次为for循环中的每个对象,然后再次为最后一个对象。
有更多的pythonic或优雅方式吗?
答案 0 :(得分:2)
如果您使用following-sibling::*
表达式,可以使事情更简单:
from lxml import objectify, etree
root = objectify.parse("input.xml").getroot()
elems = root.xpath("//object_header")
for elem in elems:
header = elem.text
objectDetails = ''
for sibling in elem.xpath("following-sibling::*"):
if sibling.tag == 'object_header':
break
objectDetails += str(etree.tostring(sibling))
print(header, objectDetails)
给出以下输入:
<root>
<object_header>object1</object_header>
<object_details>detail1</object_details>
<object_details>detail2</object_details>
<object_header>object2</object_header>
<object_details>detail1</object_details>
<object_header>object3</object_header>
</root>
代码会打印出来:
object1 b'<object_details>detail1</object_details>'b'<object_details>detail2</object_details>'
object2 b'<object_details>detail1</object_details>'
object3