Python xpath - 在一次操作中从两个不同的子节点获取文本

时间:2016-11-26 22:33:44

标签: python xpath

我有这个html块(这不是整个HTML):

<div class="self-clear">
    <div class="item-wrap self-clear float-left">
        <h2>
            Doran's Start
            <span class="notes tooltip"></span>
        </h2>
        <div class="self-clear item-group">
    <div class="main-items float-left ">
        <a href="/league-of-legends/item/dorans-ring-25">
            <div class="item ajax-tooltip {t:'Item',i:'25'}">
            <div class="item-title">
                <span class="ajax-tooltip {t:'Item',i:'25'}">Doran's Ring</span>
                <br>
            </div>
        </a>
    </div>

我想在同一个操作中获取<h2>内的每个<div>内的文本;我尝试过这样做:

    for build_names in guide_page.xpath(".//div[@class='self-clear mb10']/div/div[2]/div/h2/text() and "
                                        ".//div[@class='self-clear mb10']/div/div[2]/div/div/div/a/div[2]"
                                        "/text()"):

但是,这不正确......有可能这样做吗?

我需要从上面的操作中得到一些东西但是当我把它作为两个不同的循环时,我得到一个输出,如:

String from the first for loop
String from the first for loop

String from the second for loop... etc, etc.

我希望它是:

String from the first loop
String from the second loop
String from the first loop
String from the second loop and so on

如果我没有太清楚地解释自己,请告诉我,谢谢。

2 个答案:

答案 0 :(得分:2)

您可以使用union(|)运算符将两个XPath表达式合并为一个:

query = '''.//div[@class='self-clear mb10']/div/div[2]/div/h2/text() | 
           .//div[@class='self-clear mb10']/div/div[2]/div/div/div/a/div[2]/text()'''
for build_names in guide_page.xpath(query):
    ....

答案 1 :(得分:0)

我不知道这背后是否有任何其他因素,所以我无法解释那么多;我设法通过这样做来解决它:

    for build_names in guide_page.xpath(".//div[@class='item-wrap self-clear float-left']"):
        for x in build_names.xpath("h2/text()"):
            print(x)

        for y in build_names.xpath("div/div/a/div[2]/span/text()"):
            print(y)

输出:

Doran's Start #1st
Boots of Speed #2nd
Health Potion #2nd
Warding Totem #2nd


Morellonomicon #1st
Sorcerer's Shoes #2nd
Rylai's Crystal Scepter #2nd
Rabadon's Deathcap #2nd
Void Staff #2nd