当parse
通过Item Loader的load_item方法返回项目时,Item Pipe行功能不起作用
def parse(self,response):
DIV_SELECTOR = '.Content'
SET_SELECTOR = '.Meta'
for div in response.css(DIV_SELECTOR):
rowSelector = div.css(SET_SELECTOR)
ItemAAA= ItemLoader(item=ItemAAA(), selector=rowSelector)
ItemAAA.add_css('name','a ::text')
ItemAAA.add_css('url','a ::attr(href)')
return ItemAAA.load_item()
scrapy识别管道方法:
2017-01-10 18:25:48 [scrapy.middleware] INFO: Enabled item pipelines: ['pipeline.DuplicatesPipeline']
当parse
函数返回一个dict时,管道工作:
def parse(self,response):
for tt in response.css(SET_SELECTOR):
NAME_SELECTOR = 'a ::text'
yield { 'name': tt.css(NAME_SELECTOR).extract_first(),
}
Pipeline.py
from scrapy.exceptions import DropItem
class DuplicatesPipeline(object):
def __init__(self):
self.ids_seen = set()
def process_item(self, item, spider):
if item['name'] in self.ids_seen:
raise DropItem("Duplicate item found: %s" % item)
else:
self.ids_seen.add(item['name'])
return item
我在Windows 7中使用Python 3.5.2,scrapy 1.3通过Anaconda
答案 0 :(得分:0)
由于您使用gsub
语句打破了循环,因此您可能只在parse()方法中返回1个项目。要解决此问题,只需使用return
而不是返回将您的方法转换为生成器:
yield