使用相同的标记和不同的属性解析xml

时间:2016-06-14 12:47:11

标签: python lxml

我有一个使用Netconvert从osm创建的网络文件。根元素是具有不同属性的边。例如,在文件的第一部分中,边的组织如下。

<edge id=":367367171_1" function="internal">
    <lane id=":367367171_1_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="15.86" shape="7413.68,8096.43 7409.39,8098.94 7406.50,8098.93 7405.03,8096.39 7404.96,8091.32"/>
</edge>
<edge id=":367367171_2" function="internal">
    <lane id=":367367171_2_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="9.40" shape="7413.68,8096.43 7412.34,8099.01 7410.83,8099.98 7409.14,8099.36 7407.28,8097.13"/>
</edge>
<edge id=":367367171_3" function="internal">
    <lane id=":367367171_3_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="5.56" shape="7408.25,8091.65 7407.28,8097.13"/>
</edge>
<edge id=":367367171_4" function="internal">
    <lane id=":367367171_4_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="5.69" shape="7408.25,8091.65 7408.69,8097.32"/>
</edge>

在第二部分中,边缘文件的属性发生变化,如下所示

<edge id="102323265#13" from="1181188708" to="1181188720" priority="1" type="highway.cycleway">
    <lane id="102323265#13_0" index="0" allow="bicycle" speed="5.56" length="1.96" width="1.00" shape="14310.67,8986.24 14309.63,8984.59"/>
</edge>
<edge id="102323265#2" from="2577245263" to="1721713370" priority="1" type="highway.cycleway" shape="14903.54,9214.01 14891.64,9210.58 14796.11,9178.46 14789.16,9175.24">
    <lane id="102323265#2_0" index="0" allow="bicycle" speed="5.56" length="113.82" width="1.00" shape="14898.81,9213.21 14891.49,9211.10 14795.93,9178.98 14791.04,9176.72"/>
</edge>
<edge id="102323265#3" from="1721713370" to="1193980046" priority="1" type="highway.cycleway" shape="14789.16,9175.24 14783.34,9171.87 14779.91,9168.83 14776.75,9165.32">
    <lane id="102323265#3_0" index="0" allow="bicycle" speed="5.56" length="9.86" width="1.00" shape="14786.63,9174.41 14783.01,9172.31 14779.55,9169.24 14778.85,9168.47"/>
</edge>
<edge id="102323265#4" from="1193980046" to="1193980047" priority="1" type="highway.cycleway" shape="14776.75,9165.32 14764.89,9151.27 14762.54,9144.61">
    <lane id="102323265#4_0" index="0" allow="bicycle" speed="5.56" length="20.05" width="1.00" shape="14774.71,9163.77 14764.40,9151.55 14763.05,9147.72"/>
</edge>
<edge id="102323265#5" from="1193980047" to="1193980057" priority="1" type="highway.cycleway" shape="14762.54,9144.61 14760.31,9140.42 14753.93,9131.92 14749.20,9127.42 14743.90,9123.46 14738.81,9120.77 14731.67,9118.17 14707.61,9110.82">
    <lane id="102323265#5_0" index="0" allow="bicycle" speed="5.56" length="60.21" width="1.00" shape="14760.51,9141.98 14759.82,9140.67 14753.49,9132.25 14748.82,9127.82 14743.57,9123.90 14738.55,9121.26 14731.49,9118.68 14710.43,9112.25"/>
</edge>

如您所见,元素边缘有不同的属性。当我尝试使用以下代码访问元素时,

for elem in netFile.iter(tag='edge'):
    print(elem.attrib['from'])

我得到KeyError:'from'

当我将密钥更改为'function'而不是'from'时,代码会打印多行'internal',当它接近第一部分的末尾时,它会再次抛出我< / p>

KeyError: 'function'

我知道我必须有选择地遍历属性'from'所在的边缘,但不知道如何继续。有人可以帮忙吗?

由于

3 个答案:

答案 0 :(得分:2)

Python的get()字典方法在这些情况下非常有用,因为当None中找不到密钥时,它会返回dict

for elem in netFile.iter(tag='edge'):
    if elem.attrib.get('from'):
        # from stuff
    else:
        # other stuff

答案 1 :(得分:1)

你已经标记了这个lxml,所以有更简单的方法让有选择地迭代属性'from'存在的边,你可以使用下面的xpath找到所有的边拥有来自属性:

for e in root.xpath("//edge[@from]")

如果要检查是否有多个属性,可以使用

 .xpath("//edge[@from and @function]")

答案 2 :(得分:0)

您可以通过属性的存在来检测您正在处理的文件的哪一部分,例如:

# The !required! attributes for each part
part1_attributes = ["id", "function"]
part2_attributes = ["id", "from", "to", "priority", "type"]

for elem in netFile.iter(tag='edge'):
    if all([attr in elem.attrib for attr in part1_attributes]):
        # part 1
        print("function: " + elem.attrib["function"])
    elif all([attr in elem.attrib for attr in part2_attributes]):
        # part 2
        print("from: " + elem.attrib["from"])
    else:
        print("Unknown part found while parsing xml")
        # or raise Exception("message...") or exit program etc.

如果其中一个边缘不包含其中一个属性,则会将其排序并返回错误(或只是打印并继续),而不是像gr1zzly be4r's answer中那样返回None