Docker Scrapy Spider将数据保存到Postgres端口错误

时间:2018-08-08 02:44:32

标签: python postgresql docker scrapy port

我正在尝试使用我的抓抓蜘蛛在VPS服务器上运行。因此,我使用了Docker映像并将其附加到PostgreSQL,Scrapy,scrapy-splash映像。当我使用docker-compose up启动Spider时,我遇到了端口错误,并且Spider在我的pipes.py中似乎无法识别self.cur

当我在本地PC上运行Spider时,它运行良好,没有遇到端口或pipelines.py中有任何错误。

VPS服务器上的错误:

2018-08-08 02:19:10 [scrapy.middleware] INFO: Enabled spider midd                                                                                        lewares:
web_1        | ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
web_1        |  'scrapy_splash.SplashDeduplicateArgsMiddleware',
web_1        |  'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
web_1        |  'tutorial.middlewares.TutorialSpiderMiddleware',
web_1        |  'scrapy.spidermiddlewares.referer.RefererMiddleware',
web_1        |  'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
web_1        |  'scrapy.spidermiddlewares.depth.DepthMiddleware']
web_1        | 2018-08-08 02:19:10 [scrapy.middleware] INFO: Enabled item pipeli                                                                                        nes:
web_1        | ['tutorial.pipelines.TutorialPipeline']
web_1        | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Spider opened
web_1        | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Closing spider (sh                                                                                        utdown)
web_1        | 2018-08-08 02:19:10 [scrapy.core.engine] ERROR: Scraper close fai                                                                                        lure
web_1        | Traceback (most recent call last):
web_1        |   File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py"                                                                                        , line 82, in crawl
web_1        |     yield self.engine.open_spider(self.spider, start_requests)
web_1        | psycopg2.OperationalError: could not connect to server: Connectio                                                                                        n refused
web_1        |  Is the server running on host "localhost" (127.0.0.1) and accept                                                                                        ing
web_1        |  TCP/IP connections on port 5432?
web_1        | could not connect to server: Cannot assign requested address
web_1        |  Is the server running on host "localhost" (::1) and accepting
web_1        |  TCP/IP connections on port 5432?
web_1        |
web_1        |
web_1        | During handling of the above exception, another exception occurre                                                                                        d:
web_1        |
web_1        | Traceback (most recent call last):
web_1        |   File "/usr/local/lib/python3.6/site-packages/twisted/internet/d                                                                                        efer.py", line 654, in _runCallbacks
web_1        |     current.result = callback(current.result, *args, **kw)
web_1        |   File "/scrapy_estate/tutorial/pipelines.py", line 19, in cl                                                                                        ose_spider
web_1        |     self.cur.close()
web_1        | AttributeError: 'TutorialPipeline' object has no attribute 'cur'
web_1        | 2018-08-08 02:19:10 [scrapy.statscollectors] INFO: Dumping Scrapy                                                                                         stats:
web_1        | {'finish_reason': 'shutdown',
web_1        |  'finish_time': datetime.datetime(2018, 8, 8, 2, 19, 10, 744998),
web_1        |  'log_count/ERROR': 1,
web_1        |  'log_count/INFO': 6}
web_1        | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Spider closed (shu                                                                                        tdown)
web_1        | Unhandled error in Deferred:
web_1        | 2018-08-08 02:19:10 [twisted] CRITICAL: Unhandled error in Deferr                                                                                        ed:
web_1        |
web_1        | 2018-08-08 02:19:10 [twisted] CRITICAL:
web_1        | Traceback (most recent call last):
web_1        |   File "/usr/local/lib/python3.6/site-packages/twisted/internet/d                                                                                        efer.py", line 1418, in _inlineCallbacks
web_1        |     result = g.send(result)
web_1        |   File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py"                                                                                        , line 82, in crawl
web_1        |     yield self.engine.open_spider(self.spider, start_requests)
web_1        | psycopg2.OperationalError: could not connect to server: Connectio                                                                                        n refused
web_1        |  Is the server running on host "localhost" (127.0.0.1) and accept                                                                                        ing
web_1        |  TCP/IP connections on port 5432?
web_1        | could not connect to server: Cannot assign requested address
web_1        |  Is the server running on host "localhost" (::1) and accepting
web_1        |  TCP/IP connections on port 5432?

我的Dockerfile

FROM ubuntu:18.04
FROM python:3.6-onbuild
RUN  apt-get update &&apt-get upgrade -y&& apt-get install python-pip -y && pip3 install psycopg2 && pip3 install psycopg2-binary
RUN pip3 install --upgrade pip
RUN pip3 install scrapy --upgrade
run pip3 install scrapy-splash
COPY . /scrapy_estate
WORKDIR /scrapy_estate
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 80
EXPOSE 5432/tcp
CMD scrapy crawl estate

Docker-compose.yml:

version: "3"
services:
  interface:
    links:
      - postgres:postgres
    image: adminer
    ports:
      - "8080:8080"
    networks:
      - webnet
  postgres:
    image: postgres
    container_name: postgres
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: '123'
    volumes:
    - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    networks:
      - webnet

  web:
    image: user/scrapy_estate:latest
    build: ./tutorial
    ports:
      - "8081:8081"
    networks:
      - webnet
    environment:
      DB_HOST: postgres
    networks:
      - webnet
  splash:
    image: scrapinghub/splash
    ports:
     - "8050:8050"
    expose:
     - "8050"
networks:
  webnet:

我的pipelines.py

import psycopg2
class TutorialPipeline(object):
    def open_spider(self, spider):
        hostname = 'localhost'
        username = 'postgres'
        password = '123' # your password
        database = 'real_estate'
        self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database)
        self.cur = self.connection.cursor()

    def close_spider(self, spider):
        self.cur.close()
        self.connection.close()

    def process_item(self, item, spider):
        self.cur.execute("insert into estate(estate_title,estate_address,estate_area,estate_description,estate_price,estate_type,estate_tag,estate_date,estate_seller_name,estate_seller_address,estate_seller_phone,estate_seller_mobile,estate_seller_email) values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",(item['estate_title'],item['estate_address'],item['estate_area'],item['estate_description'],item['estate_price'],item['estate_type'],item['estate_tag'],item['estate_date'],item['estate_seller_name'],item['estate_seller_address'],item['estate_seller_phone'],item['estate_seller_mobile'],item['estate_seller_email']))
        self.connection.commit()
        return item

编辑

蜘蛛之所以可以工作,是因为我没有在docker-compose中公开端口5432,并且我的VPS上已经安装了postgreSQL,所以该端口已被使用,所以我杀死了VPS上的端口5432,再次运行并正常工作。 / p>

1 个答案:

答案 0 :(得分:1)

因为容器的IP地址网关是:172.17.0.1

因此,您应该在pipelines.py文件中将hostname = 'localhost'更改为hostname = '172.17.0.1'。然后再次运行。

将端口添加到dockerfile中以用于postgres容器:

postgres:
    image: postgres
    container_name: postgres
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: '123'
    volumes:
    - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
    - "5432:5432"
    expose:
    - "5432"
    networks:
      - webnet