Question

我正在尝试使用Python Selenium Firefox Webdriver来获取h2内容＆＃39;我的数据标题＆＃39;来自这个HTML

<div class="box">
    <ul class="navigation">
        <li class="live">
            <span>
                Section Details
            </span>
        </li>
    </ul>
</div>

<div class="box">
    <h2>
        My Data Title
    </h2>
</div>

<div class="box">
    <ul class="navigation">
        <li class="live">
            <span>
                Another Section
            </span>
        </li>
    </ul>
</div>

<div class="box">
    <h2>
        Another Title
    </h2>
</div>

每个div都有一个框类，所以我无法轻易识别出我想要的那个。有没有办法告诉Selenium在具有名为＆＃39;部分详细信息＆＃39; 的范围之后的盒子类中获取h2？

Answer 1

这是一个XPath，用于选择“Section Details”文本后面的标题：

//div[@class='box'][normalize-space(.)='Section Details']/following::h2

Answer 2

如果你想抓住带有文字h2的范围之后的框类中的Section Details，请使用xpath尝试preceding下方： -

(//h2[preceding::span[normalize-space(text()) = 'Section Details']])[1]

或使用following：

(//span[normalize-space(text()) = 'Section Details']/following::h2)[1]

和Another Section只需将xpath中的范围文本更改为： -

(//h2[preceding::span[normalize-space(text()) = 'Another Section']])[1]

或

(//span[normalize-space(text()) = 'Another Section']/following::h2)[1]

Answer 3

是的，你需要做一些复杂的xpath搜索：

referenceElementList = driver.find_elements_by_xpath("//span")
for eachElement in referenceElementList:
    if eachElement.get_attribute("innerHTML") == 'Section Details':
        elementYouWant = eachElement.find_element_by_xpath("../../../following-sibling::div/h2")

elementYouWant.get_attribute("innerHTML") should give you "My Data Title"

我的代码是：

查找所有span元素，无论它们在HTML中的位置如何，都将它们存储在名为referenceElementList的列表中;
逐个迭代span中的所有referenceElementList元素，查找其innerHTML属性为＆＃39; Section Details＆＃39;。
如果匹配，我们找到了span，我们向后导航三个级别以找到封闭的div[@class='box']，然后找到这个div元素的下一个兄弟，这是第二个{{1元素，
最后，我们从其父级找到div元素。

如果我的代码有效，请告诉我吗？我可能在某处向后导航时出错了。

您可能遇到潜在的困难，innerHTML属性可能包含制表符，新行和空格字符，在这种情况下，您需要正则表达式首先进行一些过滤。

Python Selenium Webdriver - 在指定的一个之后抓取div

3 个答案: