即使每个请求我有几个项目,但只有第一个(每个请求)正在进入管道,实际上保存为Django模型实例。
这是我的代码,我错过了什么?
# my_spider.py
class MySpider(CrawlSpider):
name = 'my_spider'
...
def parse(self, response):
x = HtmlXPathSelector(response)
item = MyDjangoItem()
headings = x.select('//h2/text()').extract()
for h in headings:
item['name'] = h
yield item
url = 'http://example.com/next' # I have custom rules for constructing (not extracting) next url
yield Request(url, callback=self.parse)
# pipelines.py
class MyPipeline(object):
def process_item(self, item, spider):
if spider.name == 'my_spider':
if item['name']:
item.save()
return item
答案 0 :(得分:9)
您需要在for循环中移动MyDjangoItem
实例化,否则它总是产生相同的对象。
# my_spider.py
class MySpider(CrawlSpider):
name = 'my_spider'
...
def parse(self, response):
x = HtmlXPathSelector(response)
headings = x.select('//h2/text()').extract()
for h in headings:
item = MyDjangoItem()
item['name'] = h
yield item
url = 'http://example.com/next' # I have custom rules for constructing (not extracting) next url
yield Request(url, callback=self.parse)