Python:使用子级重复的父标记将xml压缩为csv

时间:2019-05-28 13:35:30

标签: python xml

我的xml如下所示。

<RootTag xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Branch>
<BranchID>100</BranchID>
   <Child>
        <policy1>hello</policy1>
        <policy2>how are you</policy2>
   </Child>
   <Child>
        <policy1>hello1</policy1>
        <policy2>how are you1</policy2>
   </Child>
</Branch>
<Branch>
<BranchID>200</BranchID>
<Child>
        <policy1>I am good</policy1>
        <policy2>how about you</policy2>
</Child>
<Child>
        <policy1>I am good1</policy1>
        <policy2>how about you1</policy2>
</Child>
</Branch>
</RootTag>

我尝试使用getiterator(因为我在python 2.6中)来获取所有元素。我可以将它们展平,但是父标记(BranchID)值应该出现在其子代的所有行中。

预期输出:

BranchID,policy1,policy2
100,hello,how are you
100,hello1,how are you1
200,I am good,how about you
200,I am good1,how about you1

1 个答案:

答案 0 :(得分:0)

请尝试以下代码(切记用实际的文件路径替换YOUR FILE HERE):

import xml.etree.ElementTree as ET

root = ET.parse('YOUR FILE HERE')  # replace file name

print("BranchID,policy1,policy2")
branches = root.findall('.//Branch')
for branch in branches:
    branch_id = branch.find("BranchID").text
    for child in branch.findall('.//Child'):
        policy1 = child.find('policy1').text
        policy2 = child.find('policy2').text
        print("{},{},{}".format(branch_id,policy1,policy2))