我正在尝试使用Airflow安排Web抓取任务,并将结果转储到本地计算机上的MongoDB。我正在使用puckel / docker-airflow映像,对其进行了修改,以将MongoDB包含为附加服务。
我已经尝试过这里发布的各种解决方案,包括:
但是我仍然面临同样的问题。
我做错了什么,但我不确定是什么。
这是修改后的docker-compose文件:
version: '3'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
# link to common network
networks:
- app_tier
# Custom mongo db
mongo:
image: mongo:3.6.3
restart: always
volumes:
- /data/db:/data/db
ports:
- "172.17.0.1:27017:27017"
networks:
- app_tier
webserver:
image: puckel/docker-airflow:1.10.2
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
volumes:
- ./dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
# - ./plugins:/usr/local/airflow/plugins
# Custom python package
- ./requirements.txt:/requirements.txt
# FIFA file path
- ~/FIFA:/FIFA
# Mongo DB path
- /data/db:/data/db
# link to common network
networks:
- app_tier
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
app_tier:
driver: bridge
我正在使用mongodb:// mongo:27017连接到MongoDB。
在我的日志中,出现以下错误:
pymongo.errors.ServerSelectionTimeoutError:本地主机:27017:[Errno 111]连接被拒绝
知道我在做什么错吗?
TIA!
注意::我已经查看了本节中的答案:
From inside of a Docker container, how do I connect to the localhost of the machine?
但是我很难在docker-compose文件中实现它。
运行单个容器具有挑战性,因为puckel / docker-airflow映像的entrypoint.sh脚本取决于postgres的运行(而且我不知道如何使其在本地计算机上以相同的方式运行)。即使这样,单独运行每个服务还是有些乏味的。我尝试运行个人python映像,并将结果从容器成功转储到本地计算机中,但是我不知道如何对puckel / docker-airflow映像执行相同操作,因此卡住了。
是否有解决方案,但使用docker-compose?
编辑::似乎docker可以从我的本地计算机读取内容,但无法对其进行写入。如果我的mongod在本地计算机上运行,则会收到日志,指示已与docker容器建立连接,并且已将数据发送给它:
2019-06-04T15:51:34.299-0400 I NETWORK [listener] connection accepted from 172.23.0.3:48768 #8 (8 connections now open)
2019-06-04T15:51:34.299-0400 I NETWORK [conn8] received client metadata from 172.23.0.3:48768 conn: { driver: { name: "PyMongo", version: "3.8.0" }, os: { type: "Linux", na me: "Linux", architecture: "x86_64", version: "4.15.0-48-generic" }, platform: "CPython 3.6.8.final.0" }
2019-06-04T15:51:34.550-0400 I COMMAND [conn8] command agents_proxies.user_agents command: getMore { getMore: 20847821675, collection: "user_agents", lsid: { id: UUID("69b1 fd25-36f8-49a4-8a14-bafc83483abb") }, $db: "agents_proxies", $readPreference: { mode: "primary" } } originatingCommand: { find: "user_agents", filter: { $and: [ { $or: [ { O S: "Windows" }, { OS: "Mac OS X" }, { OS: "macOS" }, { OS: "Linux" } ] }, { $or: [ { hardware_type: "Computer" }, { hardware_type: "Windows" }, { hardware_type: "Linux" }, { hardware_type: "Mac" } ] }, { $or: [ { popularity: "Very common" }, { popularity: "Common" } ] } ] }, projection: { _id: 0, user_agent: 1 }, lsid: { id: UUID("69b1fd25-36f8 -49a4-8a14-bafc83483abb") }, $db: "agents_proxies", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:20847821675 keysExamined:0 docsExamined:10 9441 cursorExhausted:1 numYields:855 nreturned:1188 reslen:163454 locks:{ Global: { acquireCount: { r: 1712 } }, Database: { acquireCount: { r: 856 } }, Collection: { acquir eCount: { r: 856 } } } protocol:op_msg 248ms
但是,当我的python脚本尝试存储数据时,我收到了pymongo的拒绝连接消息。我开始认为这与气流Dockerfile或entrypoint.sh脚本有关。