Question

我正试图让scrapy将结果写入S3存储桶。我的配置文件中有以下内容：

ITEM_PIPELINES = {
    'scrapy.pipelines.files.S3FilesStore': 1
}
FEED_URI = 's3://1001-results-bucket/results.json'
FEED_FORMAT = 'json'

我的解析功能非常简单：

class TestSpider(scrapy.Spider):
    name = 'test'

    def start_requests(self):
        for n in range(0,1):
            request = scrapy.FormRequest("https://website", formdata={'id': "%s" % i})
            yield request

    def parse(self, response):
        yield {
            'foo': 'bar'
        }

我收到以下错误：

File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/core/scraper.py", line 71, in __init__
  self.itemproc = itemproc_cls.from_crawler(crawler)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/middleware.py", line 58, in from_crawler
  return cls.from_settings(crawler.settings, crawler)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scrapy/middleware.py", line 40, in from_settings
  mw = mwcls()
TypeError: __init__() missing 1 required positional argument: 'uri'

有什么想法吗？

Answer 1

我能够通过创建自定义管道来解决这个问题，这似乎工作得很好。

settings.py

npm run unit-test

pipelines.py

npm run unit-test-no-timeout

Scrapy to S3 Bucket：缺少1个必需的位置参数：'uri'

1 个答案: