Scrapy - 用javascript抓取网站:__ doPostBack分页

时间:2015-01-12 16:27:08

标签: javascript pagination scrapy web-crawler dopostback

Scrapy如何在网站上使用JavaScript“javascript:__ doPostBack”关注链接。 我有一个CrawlSpider工作正常。

class MySpider(CrawlSpider):
    name = 'myspider'
    allowed_domains = ['website']
    start_urls = ['website/Category/']

    rules = (
        Rule(SgmlLinkExtractor(allow='/Products/Overview/'), follow=True),
        Rule(SgmlLinkExtractor(allow=('/Products/Details/', )), callback='parse_item'),
    )

但是分页就像:

<a id="MainContent_ProductsOverview1_rptPagesTop_btnPage_1" class="btnPage" href="javascript:__doPostBack('ctl00$MainContent$ProductsOverview1$rptPagesTop$ctl02$btnPage','')" >1</a>
<a id="MainContent_ProductsOverview1_rptPagesTop_btnPage_1" class="btnPage" href="javascript:__doPostBack('ctl00$MainContent$ProductsOverview1$rptPagesTop$ctl02$btnPage','')" >2</a>

等等

我知道formdata请求示例。但我不知道如何获取URL参数。 帮助会很棒。

谢谢:D

0 个答案:

没有答案