我确实有一些http响应生成的xml
<?xml version="1.0" encoding="UTF-8"?>
<Response rid="1000" status="succeeded" moreData="false">
<Results completed="true" total="25" matched="5" processed="25">
<Resource type="h" DisplayName="Host" name="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="36.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="86.00"/>
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="29.00"/>
</PerfData>
<Resource type="vm" DisplayName="VM" name="charlie" baseHost="tango">
<Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
<PerfData attrId="cpuUsage" attrName="Usage">
<Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="6.00"/>
</PerfData>
</Resource>
</Resource>
</Result>
</Response>
如果你仔细看看 - 外面有一个相同的标签
如此高级别的xml结构如下
<Resource>
<Resource>
</Resource>
</Resource>
Python ElementTree只能解析外部xml ...下面是我的代码
pattern = re.compile(r'(<Response.*?</Response>)',
re.VERBOSE | re.MULTILINE)
for match in pattern.finditer(data):
contents = match.group(1)
responses = xml.fromstring(contents)
for results in responses:
result = results.tag
for resources in results:
resource = resources.tag
temp = {}
temp = resources.attrib
print temp
这显示以下输出(temp)
{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}
如何获取内部属性?
答案 0 :(得分:2)
不要用正则表达式解析xml!这不起作用,改为使用一些xml解析库,例如:lxml:
编辑:代码示例现在只获取热门资源,循环覆盖它们并尝试获取“子资源”,这是在评论中的OP请求之后做出的
from lxml import etree
content = '''
YOUR XML HERE
'''
root = etree.fromstring(content)
# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
# copy resource attributes in a dict
mashup = dict(resource.attrib)
# find child resource elements
subresources = resource.xpath("./Resource")
# if we find only one resource, add it to the mashup
if len(subresources) == 1:
mashup['resource'] = dict(subresources[0].attrib)
# else... not idea what the OP wants...
print mashup
那将输出:
{'resource': {'DisplayName': 'VM', 'type': 'vm', 'name': 'charlie', 'baseHost': 'tango'}, 'DisplayName': 'Host', 'type': 'h', 'name': 'tango'}