我正在使用Scrapy 0.22.2为Python 2.7.3构建一个CrawlSpider,并且遇到了Requests问题,我指定的回调方法从未被调用过。这是我的解析方法的一个片段,它在elif块中启动一个Request:
elif current_status == "Superseded":
#Need to do more work here. Have to check whether there is a replacement unit available. If there isn't, download whatever outline is there
# We need to look for a <td> element which contains "Is superseded by " and follow that link
updated_unit = hxs.xpath('/html/body/div[@id="page"]/div[@id="layoutWrapper"]/div[@id="twoColLayoutWrapper"]/div[@id="twoColLayoutLeft"]/div[@class="layoutContentWrapper"]/div[@class="outer"]/div[@class="fieldset"]/div[@class="display-row"]/div[@class="display-row"]/div[@class="display-field-info"]/div[@class="t-widget t-grid"]/table/tbody/tr[1]/td[contains(., "Is superseded by ")]/a')
# need child element a
updated_unit_link = updated_unit.xpath('@href').extract()[0]
updated_url = "http://training.gov.au" + updated_unit_link
print "\033[0;31mSuperceded by "+updated_url+"\033[0m" # prints in Red for superseded, need to follow this link to current
yield Request(url=updated_url, callback='sortSuperseded', dont_filter=True)
def sortSuperseded(self, response):
print "\033[0;35mtest callback called\033[0m"
执行此操作时没有错误,并且url正常,但是从未调用sortSuperseded,因为我从未看到名为&#39;的测试回调。打印在控制台中。
我提取的网址也在我为CrawlSpider指定的域中。
allowed_domains = ["training.gov.au"]
我哪里错了?
答案 0 :(得分:0)
回调方法名称周围不需要引号。改变这一行:
yield Request(url=updated_url, callback='sortSuperseded', dont_filter=True)
到
yield Request(updated_url, callback=self.sortSuperseded, dont_filter=True)