将字符串转换为嵌套的XML数据

时间:2013-04-23 09:06:02

标签: python xml python-3.x

我有以下字符串:

"Sweden, Västmanland, Västerås" 
"Sweden, Dalarna, Leksand" 
"Ireland, Cork, Cobh"
"Ireland, Clare, Boston"
"Ireland, Cork, Baltimore"
"Sweden, Dalarna, Mora" 

我希望转换为xml,如下所示:

<?xml version="1.0" ?> 
<data>
<country name = "Ireland">
    <region name = "Clare">
        <settlement name  = "Boston"/>
    </region>
    <region name = "Cork">
        <settlement name = "Baltimore"/>
        <settlement name = "Cobh"/>
    </region>
</country>

<country name = "Sweden">
    <region name = "Dalarna">
        <settlement name = "Leksand"/>
        <settlement name = "Mora"/>
    </region>
    <region name = "Västmanland">
        <settlement name = "Västerås"/>
    </region>
</country>
</data>

内置的python3库可能会帮助我进行这种转换,以便我不必要地重新发明轮子?

2 个答案:

答案 0 :(得分:2)

import xml.etree.ElementTree as ET
from collections import defaultdict

strings = ["Sweden, Västmanland, Västerås",
"Sweden, Dalarna, Leksand",
"Ireland, Cork, Cobh",
"Ireland, Clare, Boston",
"Ireland, Cork, Baltimore",
"Sweden, Dalarna, Mora"]

dd = defaultdict(lambda: defaultdict(list))

for s in strings:
    a, b, c = s.split(', ')
    dd[a][b].append(c)

root = ET.Element('data')

for c, regions in dd.items():
    country = ET.SubElement(root,  'country', {'name': c})
    for r, settlements in regions.items():
        region = ET.SubElement(country, 'region', {'name': r})
        for s in settlements:
            settlement = ET.SubElement(region, 'settlement', {'name': s})


import xml.dom.minidom # just to pretty print for this example
print(xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml())

<?xml version="1.0" ?>
<data>
    <country name="Ireland">
        <region name="Cork">
            <settlement name="Cobh"/>
            <settlement name="Baltimore"/>
        </region>
        <region name="Clare">
            <settlement name="Boston"/>
        </region>
    </country>
    <country name="Sweden">
        <region name="Dalarna">
            <settlement name="Leksand"/>
            <settlement name="Mora"/>
        </region>
        <region name="Västmanland">
            <settlement name="Västerås"/>
        </region>
    </country>
</data>

答案 1 :(得分:0)

您可以按如下方式将输入解析为字典:

strings = ["Sweden, Vastmanland, Vasteras",
"Sweden, Dalarna, Leksand", 
"Ireland, Cork, Cobh",
"Ireland, Clare, Boston",
"Ireland, Cork, Baltimore",
"Sweden, Dalarna, Mora" ]

d = {}
for s in strings:
    tmp = s.split(", ")
    country = tmp[0].strip()
    region = tmp[1].strip()
    settlement = tmp[2].strip()

    if d.get(country):
        if d[country].get(region):
            d[country][region].append(settlement)
        else:
            d[country][region] = [settlement]
    else:
        d[country] = {region: [settlement]} 

for k, v in d.items():
    print k,v

这给出了以下输出:

Sweden {'Vastmanland': ['Vasteras'], 'Dalarna': ['Leksand', 'Mora']}
Ireland {'Clare': ['Boston'], 'Cork': ['Cobh', 'Baltimore']}

现在您可以轻松地将此dict转换为xml字符串。

虽然,贾米拉克的回答更好。