为什么在yield Request时调用自定义回调,但调用parse方法?

时间:2014-05-04 11:34:03

标签: python web-crawler scrapy

我想浏览this web Page中的页面,我在下面编写代码,

pageNav.py

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request

class pageNaviSpider(Spider):
    name = 'navi'
    start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']

    def parse(self, response):
        print 'response from: ', response.url
        self.parseLink(response)

    def parseLink(self, response):
        print 'response from: ', response.url
        sel = Selector(response)

        for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
            yield Request(url, callback=self.parseLink) 

上面的python代码没有用。但是,我在下面写了另一个蜘蛛代码,虽然效果很好。我不知道为什么。有任何建议吗?

pageNav2.py

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request

class pageNaviSpider(Spider):
    name = 'navi2'
    start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']

def parse(self, response):
    print 'response from: ', response.url
    sel = Selector(response)

    for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
        yield Request(url, callback=self.parseLink) 

1 个答案:

答案 0 :(得分:3)

你应该改变:

def parse(self, response):
    print 'response from: ', response.url
    self.parseLink(response)

到此:

def parse(self, response):
    print 'response from: ', response.url
    for item in self.parseLink(response):
        yield item

如果没有None语句,函数将返回return/yield