我想浏览this web Page中的页面,我在下面编写代码,
pageNav.py :
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
class pageNaviSpider(Spider):
name = 'navi'
start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']
def parse(self, response):
print 'response from: ', response.url
self.parseLink(response)
def parseLink(self, response):
print 'response from: ', response.url
sel = Selector(response)
for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
yield Request(url, callback=self.parseLink)
上面的python代码没有用。但是,我在下面写了另一个蜘蛛代码,虽然效果很好。我不知道为什么。有任何建议吗?
pageNav2.py :
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
class pageNaviSpider(Spider):
name = 'navi2'
start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']
def parse(self, response):
print 'response from: ', response.url
sel = Selector(response)
for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
yield Request(url, callback=self.parseLink)
答案 0 :(得分:3)
你应该改变:
def parse(self, response):
print 'response from: ', response.url
self.parseLink(response)
到此:
def parse(self, response):
print 'response from: ', response.url
for item in self.parseLink(response):
yield item
如果没有None
语句,函数将返回return/yield
。