如何从Windows主机上的docker容器内部执行docker命令

时间:2020-01-17 16:52:41

标签: python windows docker docker-compose airflow

情况如下:我已经在本地成功开发了一个超级简单的ETL流程,该流程将从某个远程位置提取数据,然后将未处理的数据写入本地Windows计算机上的MongoDB容器中。现在,我想使用DockerOperator针对每个任务使用Apache-Airflow调度此过程,即我想创建我的源代码的Docker映像,然后使用DockerOperator在该映像中执行源代码。由于我在Windows机器上工作,因此只能在docker容器中使用Airflow来实际触发Airflow DAG。在docker-compose.yml文件中指定了Airflow容器(以下称为Webserver)和Mongo容器(以下称为mongo),您可以在最后看到。

据我所知,每次触发我的简单ETL DAG并执行DockerOperator时,Webserver容器都会为每个ETL任务创建一个新的“兄弟”容器,然后在此新容器内创建源代码执行并完成任务后,将再次删除此新容器。如果到目前为止我的理解是正确的,则webserver容器需要能够执行docker命令,例如docker build...才能创建这些兄弟容器。

为了检验该理论,我将/var/run/docker.sock:/var/run/docker.sock/usr/bin/docker:/usr/bin/docker卷添加到了docker-compose.yml文件中的webserver容器的定义中,以便webserver容器可以在我的docker守护程序上使用主机(Windows)计算机。然后,我使用docker-compose up -d启动了webserver和mongo容器,然后使用docker exec -it <name_of_webserver_container> /bin/bash进入了webserver容器,然后尝试了简单的docker命令docker ps --all。但是,此命令的输出为bash: docker: command not found。因此,似乎Docker没有正确安装在webserver容器内。如何确保将Docker安装在webserver容器内,以便可以创建其他支持容器?

下面您可以找到docker-compose.yml文件和用于Web服务器容器的Dockerfile的相关方面。

docker-compose.yml位于项目根目录中:

webserver:
        build: ./docker-airflow
        restart: always
        privileged: true
        depends_on:
            - postgres  # some other service I cut out from this post
            - mongo
            - mongo-express  # some other service I cut out from this post
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
            - POSTGRES_USER=some_user
            - POSTGRES_PASSWORD=some_pw
            - POSTGRES_DB=airflowdb
        volumes:
            # DAG folder
            - ./docker-airflow/dags:/usr/local/airflow/dags
            # Add path for external python modules
            - ./src:/home/python_modules
            # Add path for airflow workspace folder
            - ./docker-airflow/workdir:/home/workdir
            # Mount the docker socket from the host (currently my laptop) into the webserver container
            - //var/run/docker.sock:/var/run/docker.sock  # double // are necessary for windows host
        ports:
            # Change port to 8081 to avoid Jupyter conflicts
            - 8081:8080
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3
        networks:
            - mynet

Dockerfile(位于docker-airflow文件夹中的Web服务器容器:

FROM puckel/docker-airflow:1.10.4

# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"

# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt

# Install docker inside the webserver container
RUN pip install -U pip && pip install docker
ENV SHARE_DIR /usr/local/share

# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]

USER airflow

编辑/更新

结合Noe的评论后,我将Web服务器容器的Dockerfile更改为以下内容:

FROM puckel/docker-airflow:1.10.4

# Adds DAG folder to the PATH
ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/usr/local/airflow/dags"

# Install the optional packages and change the user to airflow again
COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt

# Install docker inside the webserver container
RUN curl -sSL https://get.docker.com/ | sh
ENV SHARE_DIR /usr/local/share

# Install simple text editor for debugging
RUN ["apt-get", "update"]
RUN ["apt-get", "-y", "install", "vim"]

USER airflow

,我将docker==4.1.0添加到requirements.txt文件(在上述Dockerfile中引用),该文件包含Web服务器容器内所有要安装的软件包。

但是,现在,当我首先使用docker-compose up --build -d启动服务,然后像这样docker exec -it <name_of_webserver_container> /bin/bash进入webserver容器,然后输入简单的docker命令docker ps --all时,我得到以下输出:

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1: dial unix /var/run/docker.sock: connect: permission denied

因此,似乎我仍然需要授予一些权利/特权,这让我感到困惑,因为在docker-compose.yml文件的webserver部分中,我已经放置了privileged: true。那么有人知道这个问题的原因吗?

编辑/更新/答案

从网络服务器容器的Dockerfile中删除USER airlfow之后,我就可以在网络服务器容器中使用docker命令!

2 个答案:

答案 0 :(得分:1)

您要执行的操作在docker中称为docker。

您需要执行以下操作:

  • 将Docker客户端安装在容器中

添加RUN curl -sSL https://get.docker.com/ | sh

  • 安装docker套接字

您很好地安装了//var/run/docker.sock:/var/run/docker.sock

  • 以特权模式运行容器

privileged: true添加到您的容器

在您的特定情况下,您需要执行以下操作:

  • 删除RUN pip install -U pip && pip install docker,因为我们已经安装了它
  • 删除USER airflow,您需要使用默认用户或root用户
  • docker==4.1.0添加到requirements.txt

答案 1 :(得分:0)

@Noe 的方法也适用于我。我还必须使用 wsl --set-version Ubuntu-20.04 2

将适用于 Ubuntu 的 WSL 从 V1 升级到 V2

这是用于 Airflow 2.1.1 的 Dockerfile + docker compose

Dockerfile

FROM apache/airflow:2.1.1

ENV PYTHONPATH "${PYTHONPATH}:/home/python_modules:/opt/airflow/dags"

COPY requirements.txt requirements.txt
USER root
RUN pip install -r requirements.txt

RUN curl -sSL https://get.docker.com/ | sh
ENV SHARE_DIR /usr/local/share

Docker Compose

---
    version: '3'
    x-airflow-common:
      &airflow-common
      build: .
      # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.1}
      # 
      # group_add:
      #   - 0
      environment:
        &airflow-common-env
        AIRFLOW__CORE__EXECUTOR: CeleryExecutor
        AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
        AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
        AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
        AIRFLOW__CORE__FERNET_KEY: ''
        AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
        AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
        # Need as env var otherwise container crashes while exiting. Airflow Issue # 13487
        AIRFLOW__CORE__ENABLE_XCOM_PICKLING: 'true'
        AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 5 # Just to have a fast load in the front-end. Do not use in prod w/ config 
        # Enable the Airflow API
        AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
        # _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-snowflake-connector-python==2.3.10 boto3==1.15.18 botocore==1.18.18 paramiko==2.6.0 docker==5.0.0}
        # PYTHONPATH: "${PYTHONPATH}:/home/python_modules:/opt/airflow/dags"
      volumes:
        - ./dags:/opt/airflow/dags
        - ./logs:/opt/airflow/logs
        - ./plugins:/opt/airflow/plugins
        # Pass the Docker Daemon as a volume to allow the webserver containers to start docker images
        # Windows requires a leading double slash (//) to address the Docker socket on the host
        - //var/run/docker.sock:/var/run/docker.sock
      #user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}" 
      #user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-0}" 
      depends_on:
        redis:
          condition: service_healthy
        postgres:
          condition: service_healthy
    
    services:
      postgres:
        image: postgres:13
        environment:
          POSTGRES_USER: airflow
          POSTGRES_PASSWORD: airflow
          POSTGRES_DB: airflow
        volumes:
          - postgres-db-volume:/var/lib/postgresql/data
        healthcheck:
          test: ["CMD", "pg_isready", "-U", "airflow"]
          interval: 5s
          retries: 5
        restart: always
    
      redis:
        image: redis:latest
        ports:
          - 6379:6379
        healthcheck:
          test: ["CMD", "redis-cli", "ping"]
          interval: 5s
          timeout: 30s
          retries: 50
        restart: always
    
      airflow-webserver:
        <<: *airflow-common
        # Give extended privileges to the container
        command: webserver
        ports:
          - 8080:8080
        healthcheck:
          test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
    
      airflow-scheduler:
        <<: *airflow-common
        command: scheduler
        healthcheck:
          test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
    
      airflow-worker:
        <<: *airflow-common
        # Give extended privileges to the container
        command: celery worker
        healthcheck:
          test:
            - "CMD-SHELL"
            - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
      
      # Runs airflow-db-init and airflow-db-upgrade
      # Creates a new user airflow/airflow
      airflow-init:
        <<: *airflow-common
        command: version
        environment:
          <<: *airflow-common-env
          _AIRFLOW_DB_UPGRADE: 'true'
          _AIRFLOW_WWW_USER_CREATE: 'true'
          _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
          _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
    
      flower:
        <<: *airflow-common
        command: celery flower
        ports:
          - 5555:5555
        healthcheck:
          test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
          interval: 10s
          timeout: 10s
          retries: 5
        restart: always
    
    volumes:
      postgres-db-volume: