Question

我试图在selenium中使用python来检索“年度报告”中的字词。和首次公开招股说明书＆＃39;。

我尝试使用driver.find_elements_by_class_name('sic_highlight')，但因为有多个表具有相同的class_name，所以它也会打印其他表中的所有内容。

我如何打印出年度报告＆＃39;和IPO招股说明书＆＃39;文本没有搜索其他表？

<table class="sic_table" cellspacing="1">
  <thead>
    <tr class="sic_tableTopRow">
      <th scope="col">Report Type</th>
      <th scope="col">Year Ended</th>
      <th scope="col">Download</th>
    </tr>
  </thead>
  <tbody>
      <tr class="sic_highlight">
        <th colspan="3" scope="col" class="sic_highlight">Annual Report</th>
      </tr>
        <tr>
          <th class="si_left">Annual Report&nbsp;2016</th>
          <td class="si_center">Jun 2016</td>
          <td class="si_center">
              <a href="some_link">Part 1(1.41 MB)</a><br>
          </td>
        ....
        ....
        </tr>
      <tr class="sic_highlight">
        <th colspan="3" scope="col" class="sic_highlight">IPO Prospectus</th>
      </tr>
        <tr>
          <th class="si_left">IPO Prospectus&nbsp;2011</th>
          <td class="si_center">Jul 2011</td>
          <td class="si_center">
              <a href="some_link">Part 1(5.10 MB)</a><br>
          </td>
        </tr>
  </tbody>
</table>

Answer 1

使用以下xpath

 //table[@class='sic_table']/tbody/tr/th

Answer 2

这个Xpath能够在你的html代码中找到这两个文本。试试这个

XPATH： - *//tr[@class="sic_highlight"]/th[contains(text(),"Annual Report"|"IPO Prospectus" )]

driver.find_element_by_xpath('*//tr[@class="sic_highlight"]/th[contains(text(),"Annual Report"|"IPO Prospectus")])

Answer 3

你说页面上有多个表格。你知道这张桌子的完整路径吗？获取每个'th'元素的完整（a.k.a.绝对）路径，并对find_element_by_xpath进行单独的WebDriver调用。

现在已经说过，你通常不想使用绝对路径来定位元素（它们需要很长时间并且非常脆弱）。因此，如果可能（即您或您认识的人已开发此网页并完全控制HTML），您应该在该表上放置一个ID，然后您可以执行以下操作：

driver.find_element_by_id('tableIdHere').find_elements_by_class_name('sic_highlight');

甚至更好，将ID放在你想要的第二个元素上。

python selenium print＆＃39; th＆＃39;来自选定的表

3 个答案: