这是我的蜘蛛网页:
rules = (
Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
)
def parse_item(self, response):
item = MovieNotifyItem()
item['title'] = response.xpath('//h5[@class="col s12 light center teal darken-3 white-text"]/text()').extract_first()
item['size'] = response.xpath('//*[@class="torrent-info"]//tr[1]/td[2]/text()').extract_first()
item['catagory'] = response.xpath('//*[@class="torrent-info"]//tr[2]/td[2]/text()').extract_first()
yield item
现在我想将页面链接保存到由此代码抓取的项目[item_link']中:
rules = (
Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
)
我该怎么做? 提前致谢
答案 0 :(得分:0)
如果我理解正确,您正在寻找requests
:
if response.status_code == 200:
print('Yay, my response was: %s' % response.content)
self.LastResponse = response
self.LastJson = json.loads(response.text)
return True
else:
print ("Request return " + str(response.status_code) + " error!")
# for debugging
try:
self.LastResponse = response
self.LastJson = json.loads(response.text)
except:
pass
return False