我正在尝试使用我的抓抓蜘蛛在VPS服务器上运行。因此,我使用了Docker映像并将其附加到PostgreSQL,Scrapy,scrapy-splash映像。当我使用docker-compose up
启动Spider时,我遇到了端口错误,并且Spider在我的pipes.py中似乎无法识别self.cur
。
当我在本地PC上运行Spider时,它运行良好,没有遇到端口或pipelines.py中有任何错误。
VPS服务器上的错误:
2018-08-08 02:19:10 [scrapy.middleware] INFO: Enabled spider midd lewares:
web_1 | ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
web_1 | 'scrapy_splash.SplashDeduplicateArgsMiddleware',
web_1 | 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
web_1 | 'tutorial.middlewares.TutorialSpiderMiddleware',
web_1 | 'scrapy.spidermiddlewares.referer.RefererMiddleware',
web_1 | 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
web_1 | 'scrapy.spidermiddlewares.depth.DepthMiddleware']
web_1 | 2018-08-08 02:19:10 [scrapy.middleware] INFO: Enabled item pipeli nes:
web_1 | ['tutorial.pipelines.TutorialPipeline']
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Spider opened
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Closing spider (sh utdown)
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] ERROR: Scraper close fai lure
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py" , line 82, in crawl
web_1 | yield self.engine.open_spider(self.spider, start_requests)
web_1 | psycopg2.OperationalError: could not connect to server: Connectio n refused
web_1 | Is the server running on host "localhost" (127.0.0.1) and accept ing
web_1 | TCP/IP connections on port 5432?
web_1 | could not connect to server: Cannot assign requested address
web_1 | Is the server running on host "localhost" (::1) and accepting
web_1 | TCP/IP connections on port 5432?
web_1 |
web_1 |
web_1 | During handling of the above exception, another exception occurre d:
web_1 |
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.6/site-packages/twisted/internet/d efer.py", line 654, in _runCallbacks
web_1 | current.result = callback(current.result, *args, **kw)
web_1 | File "/scrapy_estate/tutorial/pipelines.py", line 19, in cl ose_spider
web_1 | self.cur.close()
web_1 | AttributeError: 'TutorialPipeline' object has no attribute 'cur'
web_1 | 2018-08-08 02:19:10 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
web_1 | {'finish_reason': 'shutdown',
web_1 | 'finish_time': datetime.datetime(2018, 8, 8, 2, 19, 10, 744998),
web_1 | 'log_count/ERROR': 1,
web_1 | 'log_count/INFO': 6}
web_1 | 2018-08-08 02:19:10 [scrapy.core.engine] INFO: Spider closed (shu tdown)
web_1 | Unhandled error in Deferred:
web_1 | 2018-08-08 02:19:10 [twisted] CRITICAL: Unhandled error in Deferr ed:
web_1 |
web_1 | 2018-08-08 02:19:10 [twisted] CRITICAL:
web_1 | Traceback (most recent call last):
web_1 | File "/usr/local/lib/python3.6/site-packages/twisted/internet/d efer.py", line 1418, in _inlineCallbacks
web_1 | result = g.send(result)
web_1 | File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py" , line 82, in crawl
web_1 | yield self.engine.open_spider(self.spider, start_requests)
web_1 | psycopg2.OperationalError: could not connect to server: Connectio n refused
web_1 | Is the server running on host "localhost" (127.0.0.1) and accept ing
web_1 | TCP/IP connections on port 5432?
web_1 | could not connect to server: Cannot assign requested address
web_1 | Is the server running on host "localhost" (::1) and accepting
web_1 | TCP/IP connections on port 5432?
我的Dockerfile
FROM ubuntu:18.04
FROM python:3.6-onbuild
RUN apt-get update &&apt-get upgrade -y&& apt-get install python-pip -y && pip3 install psycopg2 && pip3 install psycopg2-binary
RUN pip3 install --upgrade pip
RUN pip3 install scrapy --upgrade
run pip3 install scrapy-splash
COPY . /scrapy_estate
WORKDIR /scrapy_estate
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 80
EXPOSE 5432/tcp
CMD scrapy crawl estate
Docker-compose.yml:
version: "3"
services:
interface:
links:
- postgres:postgres
image: adminer
ports:
- "8080:8080"
networks:
- webnet
postgres:
image: postgres
container_name: postgres
environment:
POSTGRES_USER: 'postgres'
POSTGRES_PASSWORD: '123'
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
networks:
- webnet
web:
image: user/scrapy_estate:latest
build: ./tutorial
ports:
- "8081:8081"
networks:
- webnet
environment:
DB_HOST: postgres
networks:
- webnet
splash:
image: scrapinghub/splash
ports:
- "8050:8050"
expose:
- "8050"
networks:
webnet:
我的pipelines.py
import psycopg2
class TutorialPipeline(object):
def open_spider(self, spider):
hostname = 'localhost'
username = 'postgres'
password = '123' # your password
database = 'real_estate'
self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database)
self.cur = self.connection.cursor()
def close_spider(self, spider):
self.cur.close()
self.connection.close()
def process_item(self, item, spider):
self.cur.execute("insert into estate(estate_title,estate_address,estate_area,estate_description,estate_price,estate_type,estate_tag,estate_date,estate_seller_name,estate_seller_address,estate_seller_phone,estate_seller_mobile,estate_seller_email) values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)",(item['estate_title'],item['estate_address'],item['estate_area'],item['estate_description'],item['estate_price'],item['estate_type'],item['estate_tag'],item['estate_date'],item['estate_seller_name'],item['estate_seller_address'],item['estate_seller_phone'],item['estate_seller_mobile'],item['estate_seller_email']))
self.connection.commit()
return item
蜘蛛之所以可以工作,是因为我没有在docker-compose中公开端口5432,并且我的VPS上已经安装了postgreSQL,所以该端口已被使用,所以我杀死了VPS上的端口5432,再次运行并正常工作。 / p>
答案 0 :(得分:1)
因为容器的IP地址网关是:172.17.0.1
因此,您应该在pipelines.py文件中将hostname = 'localhost'
更改为hostname = '172.17.0.1'
。然后再次运行。
将端口添加到dockerfile中以用于postgres容器:
postgres:
image: postgres
container_name: postgres
environment:
POSTGRES_USER: 'postgres'
POSTGRES_PASSWORD: '123'
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
expose:
- "5432"
networks:
- webnet