使用Scrapy获取Next Link

时间:2016-09-24 21:07:24

标签: xpath web-scraping scrapy

我很难让Scrapy识别下一页链接。如果使用xpath // a链接未显示。我试过了

response.xpath("//*[@id='nextpage']/a").extract()

没有运气以及其他多种排列。我正在尝试解析href="pdetail.php?instnum=2016230702&year=2016"链接

以下是代码:

<div class=""><br>
<table width="95%" align="center">
    <tbody><tr>
        <td class=""></td>
        <td align="center" class="">
            <h3 style="" class="Header">
                Detail Information For Instrument # 2016230701 In Year 2016            </h3>
        </td>

        <td class=""></td>
    </tr>
<tr>
    <td class=""><div style="float:left;margin-left:30px;" id="previouspage" class=""><a href="pdetail.php?instnum=2016230700&amp;year=2016"><button style="font-size:18px;font-family: arial" type="button" class="">Previous Page</button></a> </div></td>
    <td class=""></td>
    <td class=""><div style="float:right;" id="nextpage" class=""><a href="pdetail.php?instnum=2016230702&amp;year=2016"><button style="font-size:18px;font-family: arial" type="button" class="">Next Page</button></a></div></td>
</tr>
</tbody></table>

我运行xpath的排列,然后得到以下循环 - 页面回调自身:

2016-09-24 18:26:03 [scrapy] DEBUG: Crawled (200) <GET http://search.jeffersondeeds.com/pdetail.php?instnum=2016230701&year=2016&db=0&cnum=20> (referer: http://search.jeffersondeeds.com/pdetail.php?instnum=2016230701&year=2016&db=0&cnum=20)

1 个答案:

答案 0 :(得分:0)

试试这个 xpath

string(//*[@id="nextpage"]/a/@href)