Question

设置

我正在尝试在Wikipedia上的法国区域抓取信息框。

具体来说，我需要获取每个地区的人口。对于每个区域，其人口在每个Wiki页面的信息框中进行说明，例如参见https://en.wikipedia.org/wiki/Mayotte。

HTML

对于示例页面，我感兴趣的信息框html部分如下所示，

<tr class="mergedtoprow">
   <th colspan="2" style="text-align:center;text-align:left">Area
       <div style="font-weight:normal;display:inline;"></div></th></tr>
<tr class="mergedrow">
   <th scope="row">&nbsp;•&nbsp;Total</th> 
       <td>374&nbsp;km<sup>2</sup> (144&nbsp;sq&nbsp;mi)</td></tr>
<tr class="mergedtoprow">
   <th colspan="2" style="text-align:center;text- align:left">
       Population 
       <div style="font-weight:normal;display:inline;">
            (2017)
            <sup id="cite_ref-census_1-0" class="reference">
                 <a href="#cite_note-census-1">[1]</a>
            </sup>
       </div>
   </th>
</tr>
<tr class="mergedrow">
   <th scope="row">&nbsp;•&nbsp;Total</th>
   <td>256,518</td>
</tr>

我需要得到256,518人口。

代码

我的计划是选择包含tr字符串的'Population'，然后告诉硒在其后选择tr。

以下代码成功选择了包含tr字符串

的'Population'

info_box = browser.find_elements_by_css_selector('.infobox').find_element_by_xpath('tbody')

for row in info_box.find_elements_by_xpath('./tr'):

    if 'Population' in row.text:

        print(row)

现在！如何告诉Selenium在选定的tr之后选择tr？

Answer 1

无需遍历所有行。您只需要选择必填行

尝试以下代码行以获取所需的输出：

population = driver.find_element_by_xpath('//tr[contains(th, "Population")]/following-sibling::tr/td').text
print(population)
#  256,518

Answer 2

我认为这应该足够了

info_box = browser.find_elements_by_css_selector('.infobox').find_element_by_xpath('tbody')
tr_data = info_box.find_elements_by_xpath('./tr')
for row in range(0, len(tr_data)):

    if 'Population' in tr_data[row].text:

        print(tr_data[row + 1].text) 
        break

Answer 3

要提取总体，您可以简单地将<th>标识为 Population ，并确定下一个<tr>节点，该节点的后代<td>包含总体 256,518 ，您可以使用以下解决方案：

print(driver.find_element_by_xpath("//th[contains(., 'Population')]//following::tr[1]//td").get_attribute("innerHTML"))

硒得到下一个tr，条件为前一个tr成立

3 个答案: