如何在Ubuntu 16.04上安装Scrapy?

时间:2016-06-07 01:47:33

标签: python ubuntu scrapy python-3.5 ubuntu-16.04

我关注了the official guide,但收到了以下错误消息:

  

以下软件包具有未满足的依赖关系:
      scrapy:取决于:python-support(> = 0.90.0)但它不可安装             建议:python-setuptools但不会安装       E:无法纠正问题,你已经破了包裹。

然后我尝试了sudo apt-get python-support,但发现ubuntu 16.04已被删除python-support

最后,我尝试安装python-setuptools,但似乎只会安装python2。

The following additional packages will be installed:
 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib python
 python-minimal python-pkg-resources python2.7 python2.7-minimal
Suggested packages:
 python-doc python-tk python-setuptools-doc python2.7-doc binutils
 binfmt-support
The following NEW packages will be installed:
 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib python
 python-minimal python-pkg-resources python-setuptools python2.7
 python2.7-minimal

在Ubuntu 16.04上,如何在Python 3环境中使用Scrapy?感谢。

1 个答案:

答案 0 :(得分:2)

你应该善于:

apt-get install -y \
    python3 \
    python-dev \
    python3-dev

# for cryptography
apt-get install -y \
    build-essential \
    libssl-dev \
    libffi-dev

# for lxml
apt-get install -y \
    libxml2-dev \
    libxslt-dev

# install pip
apt-get install -y python-pip

这是一个示例Dockerfile,用于在Ubuntu 16.04 / Xenial上测试在Python 3上安装scrapy:

$ cat Dockerfile
FROM ubuntu:xenial

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update

# Install Python3 and dev headers
RUN apt-get install -y \
    python3 \
    python-dev \
    python3-dev

# Install cryptography
RUN apt-get install -y \
    build-essential \
    libssl-dev \
    libffi-dev

# install lxml
RUN apt-get install -y \
    libxml2-dev \
    libxslt-dev

# install pip
RUN apt-get install -y python-pip

RUN useradd --create-home --shell /bin/bash scrapyuser

USER scrapyuser
WORKDIR /home/scrapyuser

然后,在构建Docker镜像并为其运行容器之后:

$ sudo docker build -t redapple/scrapy-ubuntu-xenial .
$ sudo docker run -t -i redapple/scrapy-ubuntu-xenial

您可以运行pip install scrapy

下面我使用virtualenvwrapper创建Python 3 virtualenv:

scrapyuser@88cc645ac499:~$ pip install --user  virtualenvwrapper
Collecting virtualenvwrapper
  Downloading virtualenvwrapper-4.7.1-py2.py3-none-any.whl
Collecting virtualenv-clone (from virtualenvwrapper)
  Downloading virtualenv-clone-0.2.6.tar.gz
Collecting stevedore (from virtualenvwrapper)
  Downloading stevedore-1.14.0-py2.py3-none-any.whl
Collecting virtualenv (from virtualenvwrapper)
  Downloading virtualenv-15.0.2-py2.py3-none-any.whl (1.8MB)
    100% |################################| 1.8MB 320kB/s
Collecting pbr>=1.6 (from stevedore->virtualenvwrapper)
  Downloading pbr-1.10.0-py2.py3-none-any.whl (96kB)
    100% |################################| 102kB 1.5MB/s
Collecting six>=1.9.0 (from stevedore->virtualenvwrapper)
  Downloading six-1.10.0-py2.py3-none-any.whl
Building wheels for collected packages: virtualenv-clone
  Running setup.py bdist_wheel for virtualenv-clone ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/24/51/ef/93120d304d240b4b6c2066454250a1626e04f73d34417b956d
Successfully built virtualenv-clone
Installing collected packages: virtualenv-clone, pbr, six, stevedore, virtualenv, virtualenvwrapper
Successfully installed pbr six stevedore virtualenv virtualenv-clone virtualenvwrapper
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
scrapyuser@88cc645ac499:~$ source ~/.local/bin/virtualenvwrapper.sh
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/premkproject
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postmkproject
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/initialize
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/premkvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postmkvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/prermvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postrmvirtualenv
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/predeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postdeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/preactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/postactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/get_env_details
scrapyuser@88cc645ac499:~$ export PATH=$PATH:/home/scrapyuser/.local/bin
scrapyuser@88cc645ac499:~$ mkvirtualenv --python=/usr/bin/python3 scrapy11.py3
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/python3
Also creating executable in /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/predeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/postdeactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/preactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/postactivate
virtualenvwrapper.user_scripts creating /home/scrapyuser/.virtualenvs/scrapy11.py3/bin/get_env_details

安装scrapy 1.1是pip install scrapy

的问题
(scrapy11.py3) scrapyuser@88cc645ac499:~$ pip install scrapy
Collecting scrapy
  Downloading Scrapy-1.1.0-py2.py3-none-any.whl (294kB)
    100% |################################| 296kB 1.0MB/s
Collecting PyDispatcher>=2.0.5 (from scrapy)
  Downloading PyDispatcher-2.0.5.tar.gz
Collecting pyOpenSSL (from scrapy)
  Downloading pyOpenSSL-16.0.0-py2.py3-none-any.whl (45kB)
    100% |################################| 51kB 1.8MB/s
Collecting lxml (from scrapy)
  Downloading lxml-3.6.0.tar.gz (3.7MB)
    100% |################################| 3.7MB 312kB/s
Collecting parsel>=0.9.3 (from scrapy)
  Downloading parsel-1.0.2-py2.py3-none-any.whl
Collecting six>=1.5.2 (from scrapy)
  Using cached six-1.10.0-py2.py3-none-any.whl
Collecting Twisted>=10.0.0 (from scrapy)
  Downloading Twisted-16.2.0.tar.bz2 (2.9MB)
    100% |################################| 2.9MB 307kB/s
Collecting queuelib (from scrapy)
  Downloading queuelib-1.4.2-py2.py3-none-any.whl
Collecting cssselect>=0.9 (from scrapy)
  Downloading cssselect-0.9.1.tar.gz
Collecting w3lib>=1.14.2 (from scrapy)
  Downloading w3lib-1.14.2-py2.py3-none-any.whl
Collecting service-identity (from scrapy)
  Downloading service_identity-16.0.0-py2.py3-none-any.whl
Collecting cryptography>=1.3 (from pyOpenSSL->scrapy)
  Downloading cryptography-1.4.tar.gz (399kB)
    100% |################################| 409kB 1.1MB/s
Collecting zope.interface>=4.0.2 (from Twisted>=10.0.0->scrapy)
  Downloading zope.interface-4.1.3.tar.gz (141kB)
    100% |################################| 143kB 1.3MB/s
Collecting attrs (from service-identity->scrapy)
  Downloading attrs-16.0.0-py2.py3-none-any.whl
Collecting pyasn1 (from service-identity->scrapy)
  Downloading pyasn1-0.1.9-py2.py3-none-any.whl
Collecting pyasn1-modules (from service-identity->scrapy)
  Downloading pyasn1_modules-0.0.8-py2.py3-none-any.whl
Collecting idna>=2.0 (from cryptography>=1.3->pyOpenSSL->scrapy)
  Downloading idna-2.1-py2.py3-none-any.whl (54kB)
    100% |################################| 61kB 2.0MB/s
Requirement already satisfied (use --upgrade to upgrade): setuptools>=11.3 in ./.virtualenvs/scrapy11.py3/lib/python3.5/site-packages (from cryptography>=1.3->pyOpenSSL->scrapy)
Collecting cffi>=1.4.1 (from cryptography>=1.3->pyOpenSSL->scrapy)
  Downloading cffi-1.6.0.tar.gz (397kB)
    100% |################################| 399kB 1.1MB/s
Collecting pycparser (from cffi>=1.4.1->cryptography>=1.3->pyOpenSSL->scrapy)
  Downloading pycparser-2.14.tar.gz (223kB)
    100% |################################| 225kB 1.2MB/s
Building wheels for collected packages: PyDispatcher, lxml, Twisted, cssselect, cryptography, zope.interface, cffi, pycparser
  Running setup.py bdist_wheel for PyDispatcher ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/86/02/a1/5857c77600a28813aaf0f66d4e4568f50c9f133277a4122411
  Running setup.py bdist_wheel for lxml ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/6c/eb/a1/e4ff54c99630e3cc6ec659287c4fd88345cd78199923544412
  Running setup.py bdist_wheel for Twisted ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/fe/9d/3f/9f7b1c768889796c01929abb7cdfa2a9cdd32bae64eb7aa239
  Running setup.py bdist_wheel for cssselect ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/1b/41/70/480fa9516ccc4853a474faf7a9fb3638338fc99a9255456dd0
  Running setup.py bdist_wheel for cryptography ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/f6/6c/21/11ec069285a52d7fa8c735be5fc2edfb8b24012c0f78f93d20
  Running setup.py bdist_wheel for zope.interface ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/52/04/ad/12c971c57ca6ee5e6d77019c7a1b93105b1460d8c2db6e4ef1
  Running setup.py bdist_wheel for cffi ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/8f/00/29/553c1b1db38bbeec3fec428ae4e400cd8349ecd99fe86edea1
  Running setup.py bdist_wheel for pycparser ... done
  Stored in directory: /home/scrapyuser/.cache/pip/wheels/9b/f4/2e/d03e949a551719a1ffcb659f2c63d8444f4df12e994ce52112
Successfully built PyDispatcher lxml Twisted cssselect cryptography zope.interface cffi pycparser
Installing collected packages: PyDispatcher, idna, pyasn1, six, pycparser, cffi, cryptography, pyOpenSSL, lxml, w3lib, cssselect, parsel, zope.interface, Twisted, queuelib, attrs, pyasn1-modules, service-identity, scrapy
Successfully installed PyDispatcher-2.0.5 Twisted-16.2.0 attrs-16.0.0 cffi-1.6.0 cryptography-1.4 cssselect-0.9.1 idna-2.1 lxml-3.6.0 parsel-1.0.2 pyOpenSSL-16.0.0 pyasn1-0.1.9 pyasn1-modules-0.0.8 pycparser-2.14 queuelib-1.4.2 scrapy-1.1.0 service-identity-16.0.0 six-1.10.0 w3lib-1.14.2 zope.interface-4.1.3

最后测试示例项目:

(scrapy11.py3) scrapyuser@88cc645ac499:~$ scrapy startproject tutorial
New Scrapy project 'tutorial', using template directory '/home/scrapyuser/.virtualenvs/scrapy11.py3/lib/python3.5/site-packages/scrapy/templates/project', created in:
    /home/scrapyuser/tutorial

You can start your first spider with:
    cd tutorial
    scrapy genspider example example.com
(scrapy11.py3) scrapyuser@88cc645ac499:~$ cd tutorial
(scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$ scrapy genspider example example.com
Created spider 'example' using template 'basic' in module:
  tutorial.spiders.example
(scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$ cat tutorial/spiders/example.py
# -*- coding: utf-8 -*-
import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = (
        'http://www.example.com/',
    )

    def parse(self, response):
        pass
(scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$ scrapy crawl example
2016-06-07 11:08:27 [scrapy] INFO: Scrapy 1.1.0 started (bot: tutorial)
2016-06-07 11:08:27 [scrapy] INFO: Overridden settings: {'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial', 'ROBOTSTXT_OBEY': True, 'NEWSPIDER_MODULE': 'tutorial.spiders'}
2016-06-07 11:08:27 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats', 'scrapy.extensions.corestats.CoreStats']
2016-06-07 11:08:27 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-06-07 11:08:27 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-06-07 11:08:27 [scrapy] INFO: Enabled item pipelines:
[]
2016-06-07 11:08:27 [scrapy] INFO: Spider opened
2016-06-07 11:08:28 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-06-07 11:08:28 [scrapy] DEBUG: Crawled (404) <GET http://www.example.com/robots.txt> (referer: None)
2016-06-07 11:08:28 [scrapy] DEBUG: Crawled (200) <GET http://www.example.com/> (referer: None)
2016-06-07 11:08:28 [scrapy] INFO: Closing spider (finished)
2016-06-07 11:08:28 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 436,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 1921,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/404': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 6, 7, 11, 8, 28, 614605),
 'log_count/DEBUG': 2,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2016, 6, 7, 11, 8, 28, 24624)}
2016-06-07 11:08:28 [scrapy] INFO: Spider closed (finished)
(scrapy11.py3) scrapyuser@88cc645ac499:~/tutorial$