我正试图让scrapy将结果写入S3存储桶。我的配置文件中有以下内容:
ITEM_PIPELINES = {
'scrapy.pipelines.files.S3FilesStore': 1
}
FEED_URI = 's3://1001-results-bucket/results.json'
FEED_FORMAT = 'json'
我的解析功能非常简单:
class TestSpider(scrapy.Spider):
name = 'test'
def start_requests(self):
for n in range(0,1):
request = scrapy.FormRequest("https://website", formdata={'id': "%s" % i})
yield request
def parse(self, response):
yield {
'foo': 'bar'
}
我收到以下错误:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/middleware.py", line 40, in from_settings
mw = mwcls()
TypeError: __init__() missing 1 required positional argument: 'uri'
有什么想法吗?
答案 0 :(得分:0)
我能够通过创建自定义管道来解决这个问题,这似乎工作得很好。
settings.py
npm run unit-test
pipelines.py
npm run unit-test-no-timeout