如何使用scrapy将抓取页面链接保存到项目中?

时间:2016-10-09 04:26:27

标签: python python-3.x scrapy

这是我的蜘蛛网页:

rules = (
        Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        item = MovieNotifyItem()
        item['title'] = response.xpath('//h5[@class="col s12 light center teal darken-3 white-text"]/text()').extract_first()
        item['size'] = response.xpath('//*[@class="torrent-info"]//tr[1]/td[2]/text()').extract_first()
        item['catagory'] = response.xpath('//*[@class="torrent-info"]//tr[2]/td[2]/text()').extract_first()
        yield item

现在我想将页面链接保存到由此代码抓取的项目[item_link']中:

rules = (
        Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
    )

我该怎么做? 提前致谢

1 个答案:

答案 0 :(得分:0)

如果我理解正确,您正在寻找requests

    if response.status_code == 200:
        print('Yay, my response was: %s' % response.content)
        self.LastResponse = response
        self.LastJson = json.loads(response.text)
        return True
    else:
        print ("Request return " + str(response.status_code) + " error!")
        # for debugging
        try:
            self.LastResponse = response
            self.LastJson = json.loads(response.text)
        except:
            pass
        return False