情况如下:我已经在本地成功开发了一个超级简单的ETL流程,该流程将从某个远程位置提取数据,然后将未处理的数据写入本地Windows计算机上的MongoDB容器中。现在,我想使用DockerOperator针对每个任务使用Apache-Airflow调度此过程,即我想创建我的源代码的Docker映像,然后使用DockerOperator在该映像中执行源代码。由于我在Windows机器上工作,因此只能在docker容器中使用Airflow来实际触发Airflow DAG。在docker-compose.yml
文件中指定了Airflow容器(以下称为Webserver)和Mongo容器(以下称为mongo),您可以在最后看到。
据我所知,每次触发我的简单ETL DAG并执行DockerOperator时,Webserver容器都会为每个ETL任务创建一个新的“兄弟”容器,然后在此新容器内创建源代码执行并完成任务后,将再次删除此新容器。如果到目前为止我的理解是正确的,则webserver容器需要能够执行docker命令,例如docker build...
才能创建这些兄弟容器。
为了检验该理论,我将/var/run/docker.sock:/var/run/docker.sock
和/usr/bin/docker:/usr/bin/docker
卷添加到了docker-compose.yml
文件中的webserver容器的定义中,以便webserver容器可以在我的docker守护程序上使用主机(Windows)计算机。然后,我使用docker-compose up -d
启动了webserver和mongo容器,然后使用docker exec -it <name_of_webserver_container> /bin/bash
进入了webserver容器,然后尝试了简单的docker命令docker ps --all
。但是,此命令的输出为bash: docker: command not found
。因此,似乎Docker没有正确安装在webserver容器内。如何确保将Docker安装在webserver容器内,以便可以创建其他支持容器?
下面您可以找到docker-compose.yml
文件和用于Web服务器容器的Dockerfile
的相关方面。
docker-compose.yml
位于项目根目录中:
webserver:
build: ./docker-airflow
restart: always
privileged: true
depends_on:
- postgres # some other service I cut out from this post
- mongo
- mongo-express # some other service I cut out from this post
environment:
- LOAD_EX=n
- EXECUTOR=Local
- POSTGRES_USER=some_user
- POSTGRES_PASSWORD=some_pw
- POSTGRES_DB=airflowdb
volumes:
# DAG folder
- ./docker-airflow/dags:/usr/local/airflow/dags
# Add path for external python modules
- ./src:/home/python_modules
# Add path for airflow workspace folder
- ./docker-airflow/workdir:/home/workdir
# Mount the docker socket from the host (currently my laptop) into the webserver container
- //var/run/docker.sock:/var/run/docker.sock # double // are necessary for windows host
ports:
# Change port to 8081 to avoid Jupyter conflicts
- 8081:8080
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
- mynet
Dockerfile
(位于docker-airflow
文件夹中的Web服务器容器:
FROM puckel/docker-airflow:1.10.4
# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"
# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt
# Install docker inside the webserver container
RUN pip install -U pip && pip install docker
ENV SHARE_DIR /usr/local/share
# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]
USER airflow
编辑/更新:
结合Noe的评论后,我将Web服务器容器的Dockerfile更改为以下内容:
FROM puckel/docker-airflow:1.10.4
# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"
# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt
# Install docker inside the webserver container
RUN curl -sSL https://get.docker.com/ | sh
ENV SHARE_DIR /usr/local/share
# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]
USER airflow
,我将docker==4.1.0
添加到requirements.txt
文件(在上述Dockerfile中引用),该文件包含Web服务器容器内所有要安装的软件包。
但是,现在,当我首先使用docker-compose up --build -d
启动服务,然后像这样docker exec -it <name_of_webserver_container> /bin/bash
进入webserver容器,然后输入简单的docker命令docker ps --all
时,我得到以下输出:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1: dial unix /var/run/docker.sock: connect: permission denied
因此,似乎我仍然需要授予一些权利/特权,这让我感到困惑,因为在docker-compose.yml
文件的webserver部分中,我已经放置了privileged: true
。那么有人知道这个问题的原因吗?
编辑/更新/答案
从网络服务器容器的Dockerfile中删除USER airlfow
之后,我就可以在网络服务器容器中使用docker命令!
答案 0 :(得分:1)
您要执行的操作在docker中称为docker。
您需要执行以下操作:
添加
RUN curl -sSL https://get.docker.com/ | sh
您很好地安装了
//var/run/docker.sock:/var/run/docker.sock
将
privileged: true
添加到您的容器
在您的特定情况下,您需要执行以下操作:
RUN pip install -U pip && pip install docker
,因为我们已经安装了它USER airflow
,您需要使用默认用户或root用户docker==4.1.0
添加到requirements.txt 答案 1 :(得分:0)
@Noe 的方法也适用于我。我还必须使用 wsl --set-version Ubuntu-20.04 2
这是用于 Airflow 2.1.1 的 Dockerfile + docker compose
Dockerfile
FROM apache/airflow:2.1.1
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/opt/airflow/dags"
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt
RUN curl -sSL https://get.docker.com/ | sh
ENV SHARE_DIR /usr/local/share
Docker Compose
---
version: '3'
x-airflow-common:
&airflow-common
build: .
# image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.1}
#
# group_add:
# - 0
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
# Need as env var otherwise container crashes while exiting. Airflow Issue # 13487
AIRFLOW__CORE__ENABLE_XCOM_PICKLING: 'true'
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 5 # Just to have a fast load in the front-end. Do not use in prod w/ config
# Enable the Airflow API
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
# _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-snowflake-connector-python==2.3.10 boto3==1.15.18 botocore==1.18.18 paramiko==2.6.0 docker==5.0.0}
# PYTHONPATH: "${PYTHONPATH}:/home/python_modules:/opt/airflow/dags"
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
# Pass the Docker Daemon as a volume to allow the webserver containers to start docker images
# Windows requires a leading double slash (//) to address the Docker socket on the host
- //var/run/docker.sock:/var/run/docker.sock
#user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
#user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
# Give extended privileges to the container
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
# Give extended privileges to the container
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
# Runs airflow-db-init and airflow-db-upgrade
# Creates a new user airflow/airflow
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
volumes:
postgres-db-volume: