今天我在一个高山容器里用dryscrape做一个python脚本。
这是我的Dokerfile:
FROM alpine:3.7
RUN apk add --update bash &&\
apk update &&\
apk upgrade
RUN apk add --no-cache python-dev ;\
apk add --no-cache python
RUN apk add --no-cache py-pip &&\
apk add --no-cache linux-headers &&\
apk add --no-cache texinfo &&\
apk add --no-cache gcc &&\
apk add --no-cache g++ &&\
apk add --no-cache gfortran &&\
apk add --no-cache libxml2-dev &&\
apk add --no-cache xmlsec-dev &&\
apk add --no-cache py-requests &&\
apk add --no-cache make &&\
apk add --no-cache qt-dev
RUN pip install beautifulsoup4 &&\
pip install requests &&\
pip install lxml &&\
pip install html5lib &&\
pip install urllib3 &&\
pip install dryscrape
RUN apk add --no-cache icu-libs &&\
apk add --no-cache git &&\
git clone "https://github.com/niklasb/webkit-server.git" &&\
cd webkit-server &&\
python setup.py install
# prepare le shell
CMD ["bash"]
WORKDIR "/root"
我忘记了事情,因为当我开始dryscrape.Session()
File "/usr/lib/python2.7/site-packages/webkit_server.py", line 427, in __init__
raise WebkitServerError("webkit-server failed to start. Output:\n" + err)
webkit_server.WebkitServerError: webkit-server failed to start. Output:
webkit_server: cannot connect to X server
你知道为什么我会收到这个错误吗?谢谢大家
答案 0 :(得分:0)
您需要在容器内运行X服务器才能使用dryscrape.Session()
。
在Alpine平台上实现这一目标的最佳方法是安装并运行xvfb
(X虚拟帧缓冲区的简称)。
一些例子:
1。 Dockerfile
的结尾是:
RUN apk add --no-cache icu-libs &&\
apk add --no-cache git &&\
git clone "https://github.com/niklasb/webkit-server.git" &&\
cd webkit-server &&\
python setup.py install
RUN apk add --no-cache xvfb
ADD start_script.sh /root/start_script.sh
# prepare le shell
CMD ["/root/start_script.sh"]
WORKDIR "/root"
2。 start_script.sh
是:
#!/bin/sh
Xvfb :00 &
export DISPLAY=:00
bash
现在您可以继续dryscrape.Session()
:
$ docker run -ti <image_id>
bash-4.4# ps aux
PID USER TIME COMMAND
1 root 0:00 {start_script.sh} /bin/sh /root/start_script.sh
7 root 0:00 Xvfb :00
8 root 0:00 bash
11 root 0:00 ps aux
bash-4.4# python
Python 2.7.14 (default, Dec 14 2017, 15:51:29)
[GCC 6.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dryscrape
>>> dryscrape.Session()
<dryscrape.session.Session object at 0x7f1601ea4ad0>
>>>