我有这个代码。当我使用yield来请求更多链接时,我得到这个错误
Spider must return Request, BaseItem or None, got 'dict'
我已经尝试了一切,但我无法摆脱错误
代码在这里
def parse_items(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select("//li[contains(concat(' ', @class, ' '), ' mod-searchresult-entry ')]")
items = []
for site in sites[:2]:
item = SeekItem()
item['title'] = myfilter(site.select('dl/dd/h2/a').select("string()").extract())
item['link_url'] = myfilter(site.select('dl/dd/h2/em').select("string()").extract())
item['description'] = myfilter(site.select('dl/dd/p').select("string()").extract())
if item['link_url']:
yield Request(urljoin('http://www.seek.com.au/', item['link_url']),
meta = item,
callback = self.parseItemDescription)
yield item
def parseItemDescription(self, response):
item = response.meta
hxs = HtmlXPathSelector(response)
sites = hxs.select("//li[contains(concat(' ', @class, ' '), ' mod-searchresult-entry ')]")
item['description'] = "mytest"
return item
答案 0 :(得分:4)
您使用的是哪种版本的scrapy? 0.16.2的文档使用passing items to another callback的方法。
def parse_items(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select("//li[contains(concat(' ', @class, ' '), ' mod-searchresult-entry ')]")
items = []
for site in sites[:2]:
item = SeekItem()
item['title'] = myfilter(site.select('dl/dd/h2/a').select("string()").extract())
item['link_url'] = myfilter(site.select('dl/dd/h2/em').select("string()").extract())
item['description'] = myfilter(site.select('dl/dd/p').select("string()").extract())
if item['link_url']:
request = Request("http://www.example.com/some_page.html", callback=self.parseItemDescription)
request.meta['item'] = item
return request
def parseItemDescription(self, response):
item = response.meta['item']
hxs = HtmlXPathSelector(response)
sites = hxs.select("//li[contains(concat(' ', @class, ' '), ' mod-searchresult-entry ')]")
item['description'] = "mytest"
return item
注意:这是未经测试的,因为其余代码(spider,items.py等)丢失了,我不确定这是如何运行的
答案 1 :(得分:3)
你屈服两次 - 第一次是请求;第二个是 dic 。 ( yield Request(...)和 yield item )
我猜第二次是不必要的,应该删除。试试并在下面评论。 (删除显示产量项的行)