我很难让Scrapy识别下一页链接。如果使用xpath // a链接未显示。我试过了
response.xpath("//*[@id='nextpage']/a").extract()
没有运气以及其他多种排列。我正在尝试解析href="pdetail.php?instnum=2016230702&year=2016"
链接
以下是代码:
<div class=""><br>
<table width="95%" align="center">
<tbody><tr>
<td class=""></td>
<td align="center" class="">
<h3 style="" class="Header">
Detail Information For Instrument # 2016230701 In Year 2016 </h3>
</td>
<td class=""></td>
</tr>
<tr>
<td class=""><div style="float:left;margin-left:30px;" id="previouspage" class=""><a href="pdetail.php?instnum=2016230700&year=2016"><button style="font-size:18px;font-family: arial" type="button" class="">Previous Page</button></a> </div></td>
<td class=""></td>
<td class=""><div style="float:right;" id="nextpage" class=""><a href="pdetail.php?instnum=2016230702&year=2016"><button style="font-size:18px;font-family: arial" type="button" class="">Next Page</button></a></div></td>
</tr>
</tbody></table>
我运行xpath的排列,然后得到以下循环 - 页面回调自身:
2016-09-24 18:26:03 [scrapy] DEBUG: Crawled (200) <GET http://search.jeffersondeeds.com/pdetail.php?instnum=2016230701&year=2016&db=0&cnum=20> (referer: http://search.jeffersondeeds.com/pdetail.php?instnum=2016230701&year=2016&db=0&cnum=20)
答案 0 :(得分:0)
试试这个 xpath :
string(//*[@id="nextpage"]/a/@href)