我是python或编码的新手,所以请耐心等待我的问题,
所以这是我忙碌的XML
<?xml version="1.0" encoding="utf-8"?>
<Total>
<ID>999</ID>
<Response>
<Detail>
<Nix>
<Check>pass</Check>
</Nix>
<MaxSegment>
<Status>V</Status>
<Input>
<Name>
<First>jack</First>
<Last>smiths</Last>
</Name>
<Address>
<StreetAddress1>100 rodeo dr</StreetAddress1>
<City>long beach</City>
<State>ca</State>
<ZipCode>90802</ZipCode>
</Address>
<DriverLicense>
<Number>123456789</Number>
<State>ca</State>
</DriverLicense>
<Contact>
<Email>x@me.com</Email>
<Phones>
<Home>0000000000</Home>
<Work>1111111111</Work>
</Phones>
</Contact>
</Input>
<Type>Regular</Type>
</MaxSegment>
</Detail>
</Response>
</Total>
我想要做的是将这些值提取到下面漂亮干净的表中:
到目前为止,这是我的代码..但我无法弄清楚如何获得子代:
import os
os.chdir('d:/py/xml/')
import xml.etree.ElementTree as ET
tree = ET.parse('xxml.xml')
root=tree.getroot()
x = root.tag
y = root.attrib
print(x,y)
#---PRINT ALL NODES---
for child in root:
print(child.tag, child.attrib)
提前谢谢!
答案 0 :(得分:2)
您可以创建一个字典,将列名映射到提取相应值的xpath表达式,例如:
xpath = {
"ID": "/Total/ID/text()",
"Check": "/Total/Response/Detail/Nix/Check/text()", # or "//Check/text()"
}
填充表格行:
row = {name: tree.xpath(path) for name, path in xpath.items()}
以上假设您使用支持完整xpath语法的lxml
。 ElementTree supports only a subset of XPath expressions但在您的情况下可能已足够(您可以删除“text()”表达式并在此情况下使用el.text
)例如:
xpath = {
"ID": ".//ID",
"Check": ".//Check",
}
row = {name: tree.findtext(path) for name, path in xpath.items()}
打印具有相应标签名称的所有文本:
import xml.etree.cElementTree as etree
for _, el in etree.iterparse("xxm.xml"):
if el.text and not el: # leaf element with text
print el.tag, el.text
如果列名与标记名不同(如您的情况),则最后一个示例不足以构建表。
答案 1 :(得分:2)
这是您遍历树并仅打印文本节点的方法:
def traverse(node):
show = True
for c in node.getchildren():
show = False
traverse(c)
if show:
print node.tag, node.text
对于你的例子,我得到以下内容:
traverse(root)
ID 999
Check pass
Status V
First jack
Last smiths
StreetAddress1 100 rodeo dr
City long beach
State ca
ZipCode 90802
Number 123456789
State ca
Email x@me.com
Home 0000000000
Work 1111111111
Type Regular
您可以存储(node.tag, node.text)
元组或将{node.tag: node.text}
存储在字典中,而不是打印出来。