我有一个scrapy + Selenium spider包装在一个docker容器中。我想通过将一些武器传递给蜘蛛来运行该容器。但是,由于某种原因,我收到一条奇怪的错误消息。在提交问题之前,我进行了广泛的搜索并尝试了许多不同的选项。
Dockerfile
FROM python:2.7
# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
# install xvfb
RUN apt-get install -yqq xvfb
# install pyvirtualdisplay
RUN pip install pyvirtualdisplay
# set display port and dbus env to avoid hanging
ENV DISPLAY=:99
ENV DBUS_SESSION_BUS_ADDRESS=/dev/null
#install scrapy
RUN pip install --upgrade pip && \
pip install --upgrade \
setuptools \
wheel && \
pip install --upgrade scrapy
# install selenium
RUN pip install selenium==3.8.0
# install xlrd
RUN pip install xlrd
# install bs4
RUN pip install beautifulsoup4
ADD . /tralala/
WORKDIR tralala/
CMD scrapy crawl personel_spider_mpc -a chunksNo=$chunksNo -a chunkI=$chunkI
我猜问题可能出在CMD部分。
Spider init部分:
class Crawler(scrapy.Spider):
name = "personel_spider_mpc"
allowed_domains = ['tralala.de',]
def __init__(self, vdisplay = True, **kwargs):
super(Crawler, self).__init__(**kwargs)
self.chunkI = chunkI
self.chunksNo = chunksNo
我如何运行容器:
docker run --env chunksNo='10' --env chunkI='1' ostapp/tralala
我试过两个引号并没有它们
错误消息:
2018-04-04 16:42:32 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
spider = cls(*args, **kwargs)
File "/tralala/tralala/spiders/tralala_spider_mpc.py", line 673, in __init__
self.chunkI = chunkI
NameError: global name 'chunkI' is not defined
答案 0 :(得分:1)
您的参数存储在kwargs
中,它只是一个字典,键作为参数名称,值作为参数值。它没有为您定义名称,因此您会收到错误。
有关详细信息,请参阅this answer
在您的具体情况下,请尝试self.chunkI = kwargs['chunkI']
和self.chunksNo = kwargs['chunksNo']