通过docker run

时间:2018-04-04 16:51:28

标签: python docker scrapy

我有一个scrapy + Selenium spider包装在一个docker容器中。我想通过将一些武器传递给蜘蛛来运行该容器。但是,由于某种原因,我收到一条奇怪的错误消息。在提交问题之前,我进行了广泛的搜索并尝试了许多不同的选项。

Dockerfile

FROM python:2.7

# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable

# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/

# install xvfb
RUN apt-get install -yqq xvfb

# install pyvirtualdisplay
RUN pip install pyvirtualdisplay

# set display port and dbus env to avoid hanging
ENV DISPLAY=:99
ENV DBUS_SESSION_BUS_ADDRESS=/dev/null

#install scrapy
RUN pip install --upgrade pip && \
    pip install --upgrade \
        setuptools \
        wheel && \
    pip install --upgrade scrapy

# install selenium
RUN pip install selenium==3.8.0

# install xlrd
RUN pip install xlrd

# install bs4
RUN pip install beautifulsoup4

ADD . /tralala/

WORKDIR tralala/
CMD scrapy crawl personel_spider_mpc -a chunksNo=$chunksNo -a chunkI=$chunkI

我猜问题可能出在CMD部分。

Spider init部分:

class Crawler(scrapy.Spider):

    name = "personel_spider_mpc"

    allowed_domains = ['tralala.de',]

    def __init__(self, vdisplay = True, **kwargs):
        super(Crawler, self).__init__(**kwargs)
        self.chunkI = chunkI
        self.chunksNo = chunksNo

我如何运行容器:

docker run --env chunksNo='10' --env chunkI='1' ostapp/tralala

我试过两个引号并没有它们

错误消息:

2018-04-04 16:42:32 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 98, in crawl
    six.reraise(*exc_info)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 79, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
    spider = cls(*args, **kwargs)
  File "/tralala/tralala/spiders/tralala_spider_mpc.py", line 673, in __init__
    self.chunkI = chunkI
NameError: global name 'chunkI' is not defined

1 个答案:

答案 0 :(得分:1)

您的参数存储在kwargs中,它只是一个字典,键作为参数名称,值作为参数值。它没有为您定义名称,因此您会收到错误。

有关详细信息,请参阅this answer

在您的具体情况下,请尝试self.chunkI = kwargs['chunkI']self.chunksNo = kwargs['chunksNo']