解析时如何使用python识别位于XML中同一级别的元素

时间:2018-07-29 23:46:54

标签: python xml parsing

此刻我一直在尝试使用python解析XML,今天也有一个问题。

您知道如何识别XML中位于同一级别的元素吗?

对于XML示例如下:

<AAA>
    <BBB>1</BBB>
    <CCC>*</CCC>
    <BBB>1</BBB>  <--- need to remove
    <BBB>1</BBB>
    <CCC>*</CCC>
    <BBB>1</BBB>  <--- need to remove
</AAA>

我知道如何删除位于第一行或最后一行的元素,但是 如果我要删除CCC下方的BBB元素,该怎么做?

1 个答案:

答案 0 :(得分:1)

这是使用ElementTree的解决方案。

from xml.etree import ElementTree as ET

XML = """ 
<AAA>
    <BBB>1</BBB>
    <CCC>*</CCC>
    <BBB>2</BBB>
    <BBB>3</BBB>
    <CCC>*</CCC>
    <BBB>4</BBB>
</AAA>"""

root = ET.fromstring(XML)

# All children of AAA (siblings in document order)
children = root.findall("*")  

# Find all BBB elements that immediately follow a CCC element
to_remove = []
for i in range(1, len(children)):
    curr = children[i]
    prev = children[i-1]
    if curr.tag == "BBB" and prev.tag == "CCC":
        to_remove.append(curr)

# Remove the found BBB elements 
for elem in to_remove:
    root.remove(elem)

print(ET.tostring(root).decode("UTF-8"))

输出:

<AAA>
    <BBB>1</BBB>
    <CCC>*</CCC>
    <BBB>3</BBB>
    <CCC>*</CCC>
    </AAA>