模块使用" https://github.com/aivarsk/scrapy-proxies" 如果链接到电脑上的现有txt磁贴,则指定的设置和PC上的设置完美无缺。
我在settings.py文件中尝试了几次Scrapy Cloud的不同方法。
我添加文件" proxylist.txt"在比设置项目相同的文件夹中,我将其上传到" https://dl.dropboxusercontent.com/s/esdm19mnvz2yguf/proxylist.txt"
我将名称替换为: PROXY_LIST =' https://dl.dropboxusercontent.com/s/esdm19mnvz2yguf/proxylist.txt' 要么 PROXY_LIST =' proxylist.txt' 要么 PROXY_LIST =' /proxylist.txt' PROXY_LIST =' ../ proxylist.txt'
如果我这样做,就像PROXY_LIST =' proxylist.txt'在我的电脑中,它就像一个魅力,但不是一次我在Scrapy Cloud中加载它。
我收到错误。
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in init
self.downloader = downloader_cls(crawler)
File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/init.py", line 88, in init
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 36, in from_settings
mw = mwcls.from_crawler(crawler)
File "/app/python/lib/python2.7/site-packages/scrapy_proxies/randomproxy.py", line 55, in from_crawler
return cls(crawler.settings)
File "/app/python/lib/python2.7/site-packages/scrapy_proxies/randomproxy.py", line 35, in init
fin = open(self.proxy_list)
IOError: [Errno 2] No such file or directory: '../proxylist.txt'
我需要一些帮助。
答案 0 :(得分:0)
您很可能不会在setup.py
说明中包含此文件。
提供此功能的机制是MANIFEST.in文件。这相对来说非常简单:MANIFEST.in
实际上只是指定要包含的文件或整数的相对文件路径列表。:
include README.rst
include docs/*.txt
include funniest/data.json
为了将这些文件在安装时复制到site-packages中的软件包文件夹,您需要向include_package_data=True
函数提供setup()
。
请参阅http://python-packaging.readthedocs.io/en/latest/non-code-files.html