如何在lxml中获取范围内的元素

时间:2019-06-04 09:09:21

标签: xml python-3.x lxml

我有一个类似于以下xml的xml。我正在尝试根据某个范围的属性“ id”获取名称为“ elem”的元素。

例如:将所有“ elem”元素从id = 4改为id = 8。

<all_levels>
<level1>
    <level2>
        <level3>
        <elem id="1"> </elem>
        <elem id="2"> </elem>
        </level3>
        <level3>
        <elem id="3"> </elem>
        <elem id="4"> </elem>
        </level3>
    </level2>
    <level2>
        <level3>
        <elem id="5"> </elem>
        <elem id="6"> </elem>
        </level3>
        <level3>
        <elem id="7"> </elem>
        <elem id="8"> </elem>
        </level3>
    </level2>
</level1>
<level1>
    <level2>
        <level3>
        <elem id="9"> </elem>
        <elem id="10"> </elem>
        </level3>
        <level3>
        <elem id="11"> </elem>
        <elem id="12"> </elem>
        </level3>
    </level2>
    <level2>
        <level3>
        <elem id="13"> </elem>
        <elem id="14"> </elem>
        </level3>
        <level3>
        <elem id="15"> </elem>
        <elem id="16"> </elem>
        </level3>
    </level2>
</level1>
</all_levels>

我尝试了两种方法: 1)使用xpath获取所需的“ elem”元素,例如 从(4,8)范围中获取元素

from lxml import etree
sample_xml = etree.parse("sample_xml.xml")
elem1 = sample_xml.xpath("//word[@id = '%s']" % str(4))[0]
elem2 = sample_xml.xpath("//word[@id = '%s']" % str(5))[0]
elem3 = sample_xml.xpath("//word[@id = '%s']" % str(6))[0]
elem4 = sample_xml.xpath("//word[@id = '%s']" % str(7))[0]
elem5 = sample_xml.xpath("//word[@id = '%s']" % str(8))[0]

但是,如果范围很大,则获取所有元素会花费太多时间。

2)使用xpath获取该范围内的第一个elem,使用getnext()方法获取同级符号

from lxml import etree
sample_xml = etree.parse("sample_xml.xml")
elem1 = sample_xml.xpath("//word[@id = '%s']" % str(4))[0]
elems = [elem1]
curr_word = elem1
current_id = 4
while(current_id <= 8):
    curr_elem = curr_word.getnext()
    elems.append(curr_elem)
    current_id += 1

但是问题是getnext()只在同一棵树中得到elem。因此它无法获取所有其他元素。

是否有比使用xpath更好的方式获取范围内的元素?

1 个答案:

答案 0 :(得分:1)

似乎我们可以使用xpath有效地获取属性“ id”在特定范围内的所有“ elem”。

下面是两种方法。我已经使用了单元魔术命令“ %% time”来衡量每种方法花费了多少时间。

from lxml import etree
sample_xml = etree.parse("sample_xml.xml")

方法1:

%%time
start_heading_id = 4
ending_heading_id = 1000
elem1 = sample_xml.xpath("//elem[@id = '%s']" % str(start_heading_id))[0]
elems = [elem1]
curr_word = elem1
current_id = start_heading_id
while(current_id <= ending_heading_id):
    curr_elem = sample_xml.xpath("//elem[@id = '%s']" % str(current_id+1))[0]
    elems.append(curr_elem)
    current_id += 1

输出(花费13.2秒获取所有元素):

CPU times: user 13.2 s, sys: 23.6 ms, total: 13.2 s
Wall time: 13.2 s

方法2:

%%time
start_heading_id = 4
ending_heading_id = 1000
elems = sample_xml.xpath("//elem[@id >= '%d' and @id <= '%d']" % (start_heading_id,ending_heading_id))

输出(花费0.00387秒获取所有元素):

CPU times: user 39.2 ms, sys: 1.25 ms, total: 40.5 ms
Wall time: 38.7 ms