我是docker的新手,并且已经构建了一个自定义容器来在我的云服务器上运行我的Spider。我的scraper是使用python 3.6,scrapy 1.6,Selenium构建的,并使用docker在一个容器中运行所有内容。启动蜘蛛时,我有scrapy open_spider方法,该方法在目录中运行另一个python脚本,该脚本生成了scrapy进行爬网的网址。该脚本将链接保存在文本文件中,但是,我得到PermissionError:[Errno 13]权限被拒绝:'tmp。
我尝试在tmp文件夹上运行chmod 777和a + rw,因此它将允许我创建文本文件,但仍然出现权限被拒绝的错误。我已经研究了好几天了,却找不到解决方法。
笔记本电脑上的操作系统是ubuntu 18.04。
下面是指向我的docker文件的链接
Dockerfile
div[@id = 'info']//text()[parent::div]
在此链接到我的setup.py文件
FROM scrapinghub/scrapinghub-stack-scrapy:1.6-py3
RUN apt-get -y --no-install-recommends install zip unzip jq libxml2 libxml2-dev
RUN printf "deb http://archive.debian.org/debian/ jessie main\ndeb-src http://archive.debian.org/debian/ jessie main\ndeb http://security.debian.org jessie/updates main\ndeb-src http://security.debian.org jessie/updates main" > /etc/apt/sources.list
#============================================
# Google Chrome
#============================================
# can specify versions by CHROME_VERSION;
# e.g. google-chrome-stable=53.0.2785.101-1
# google-chrome-beta=53.0.2785.92-1
# google-chrome-unstable=54.0.2840.14-1
# latest (equivalent to google-chrome-stable)
# google-chrome-beta (pull latest beta)
#============================================
ARG CHROME_VERSION="google-chrome-stable"
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
&& apt-get update -qqy \
&& apt-get -qqy install \
${CHROME_VERSION:-google-chrome-stable} \
&& rm /etc/apt/sources.list.d/google-chrome.list \
&& rm -rf /var/lib/apt/lists/* /var/cache/apt/*
#============================================
# Chrome Webdriver
#============================================
# can specify versions by CHROME_DRIVER_VERSION
# Latest released version will be used by default
#============================================
ARG CHROME_DRIVER_VERSION
RUN CHROME_STRING=$(google-chrome --version) \
&& CHROME_VERSION_STRING=$(echo "${CHROME_STRING}" | grep -oP "\d+\.\d+\.\d+\.\d+") \
&& CHROME_MAYOR_VERSION=$(echo "${CHROME_VERSION_STRING%%.*}") \
&& wget --no-verbose -O /tmp/LATEST_RELEASE "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_${CHROME_MAYOR_VERSION}" \
&& CD_VERSION=$(cat "/tmp/LATEST_RELEASE") \
&& rm /tmp/LATEST_RELEASE \
&& if [ -z "$CHROME_DRIVER_VERSION" ]; \
then CHROME_DRIVER_VERSION="${CD_VERSION}"; \
fi \
&& CD_VERSION=$(echo $CHROME_DRIVER_VERSION) \
&& echo "Using chromedriver version: "$CD_VERSION \
&& wget --no-verbose -O /tmp/chromedriver_linux64.zip https://chromedriver.storage.googleapis.com/$CD_VERSION/chromedriver_linux64.zip \
&& rm -rf /opt/selenium/chromedriver \
&& unzip /tmp/chromedriver_linux64.zip -d /opt/selenium \
&& rm /tmp/chromedriver_linux64.zip \
&& mv /opt/selenium/chromedriver /opt/selenium/chromedriver-$CD_VERSION \
&& chmod 755 /opt/selenium/chromedriver-$CD_VERSION \
&& sudo ln -fs /opt/selenium/chromedriver-$CD_VERSION /usr/bin/chromedriver
#============================================
# crawlera-headless-proxy
#============================================
RUN curl -L https://github.com/scrapinghub/crawlera-headless-proxy/releases/download/1.1.1/crawlera-headless-proxy-linux-amd64 -o /usr/local/bin/crawlera-headless-proxy \
&& chmod +x /usr/local/bin/crawlera-headless-proxy
RUN chmod a+rw app/cars/spiders
RUN chmod a+rw app/cars/tmp
COPY ./start-crawl /usr/local/bin/start-crawl
ENV TERM xterm
ENV SCRAPY_SETTINGS_MODULE cars.settings
RUN pip install --upgrade pip
RUN mkdir -p /app
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
RUN python setup.py install
RUN chmod a+rw app/cars/tmp
答案 0 :(得分:1)
将以下内容添加到您的Dockerfile中:
RUN adduser --disabled-login dockeruser
USER dockuser
RUN chown dockuser:dockuser -R /tmp/
注意:如果--disabled-login
也无效,请使用--disabled-password
答案 1 :(得分:0)
也许是以下行中的“ setup.py”文件:
RUN python setup.py install
需要以适当的权限执行吗?