创建循环以在scrapy中解析表数据

时间:2017-04-21 04:38:00

标签: web-scraping scrapy

我正在尝试使用以下HTML循环表行。我使用以下xpath选择器//*[@id="employee-table"]/tbody/tr,但它不起作用。

<table id="employee-table" class="table table-striped table-bordered responsive-table dataTable no-footer" role="grid" aria-describedby="employee-table_info" style="width: 882px;">
<thead>
<tr role="row"><th class="sorting_asc" tabindex="0" aria-controls="employee-table" rowspan="1" colspan="1" aria-sort="ascending" aria-label=" Name : activate to sort column descending" style="width: 174px;"> Name </th><th class="sorting" tabindex="0" aria-controls="employee-table" rowspan="1" colspan="1" aria-label=" Year : activate to sort column ascending" style="width: 36px;"> Year </th><th class="sorting" tabindex="0" aria-controls="employee-table" rowspan="1" colspan="1" aria-label=" Title : activate to sort column ascending" style="width: 82px;"> Title </th><th class="sorting" tabindex="0" aria-controls="employee-table" rowspan="1" colspan="1" aria-label=" Agency : activate to sort column ascending" style="width: 192px;"> Agency </th><th class="sorting" tabindex="0" aria-controls="employee-table" rowspan="1" colspan="1" aria-label=" Location : activate to sort column ascending" style="width: 115px;"> Location </th><th class="sorting" tabindex="0" aria-controls="employee-table" rowspan="1" colspan="1" aria-label=" Salary : activate to sort column ascending" style="width: 50px;"> Salary </th></tr>
</thead>
<tbody>
<tr role="row" class="odd"><td class="sorting_1"><a href="/employees/veterans-health-administration/bharatkumar-a-g">A G. Bharatkumar</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Wisconsin</td><td>$335,000</td></tr><tr role="row" class="even"><td class="sorting_1"><a href="/employees/veterans-health-administration/roure-a-rafael">A Rafael Roure</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Florida</td><td>$333,634</td></tr><tr role="row" class="odd"><td class="sorting_1"><a href="/employees/veterans-health-administration/dumont-aaron-s">Aaron S. Dumont</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Louisiana</td><td>$330,302</td></tr><tr role="row" class="even"><td class="sorting_1"><a href="/employees/veterans-health-administration/andrews-aaron-t">Aaron T. Andrews</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Florida</td><td>$350,000</td></tr><tr role="row" class="odd"><td class="sorting_1"><a href="/employees/veterans-health-administration/elmi-abdolali">Abdolali Elmi</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>West Virginia</td><td>$325,056</td></tr><tr role="row" class="even"><td class="sorting_1"><a href="/employees/veterans-health-administration/haleem-abdul-a">Abdul A. Haleem</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Missouri</td><td>$351,056</td></tr><tr role="row" class="odd"><td class="sorting_1"><a href="/employees/veterans-health-administration/ward-abner-m">Abner M. Ward</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Hawaii</td><td>$337,756</td></tr><tr role="row" class="even"><td class="sorting_1"><a href="/employees/veterans-health-administration/cohen-adam-c">Adam C. Cohen</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Indiana</td><td>$340,000</td></tr><tr role="row" class="odd"><td class="sorting_1"><a href="/employees/veterans-health-administration/bakker-adam-j">Adam J. Bakker</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Minnesota</td><td>$325,980</td></tr><tr role="row" class="even"><td class="sorting_1"><a href="/employees/veterans-health-administration/bracha-adam-s">Adam S. Bracha</a></td><td>2015</td><td><a href="/employees/occupations/medical-officer">Medical Officer</a></td><td><a href="/employees/veterans-health-administration">Veterans Health Administration</a></td><td>Florida</td><td>$335,000</td></tr></tbody>
</table>

1 个答案:

答案 0 :(得分:2)

尝试//*[@id="employee-table"]/tr

你的xpath不起作用的原因是tbody。您必须将其删除并检查是否得到了您想要的结果。

您可以在scrapy文档中阅读:http://doc.scrapy.org/en/0.14/topics/firefox.html

  

Firefox尤其以添加<tbody>元素而着称   表。另一方面,Scrapy不会修改原始页面   HTML,因此如果您使用<tbody>,则无法提取任何数据   你的XPath表达式。