在DAG中使用boto3时,Apache airflow无法找到AWS凭证

时间:2020-08-24 11:36:08

标签: amazon-web-services docker airflow

我们正在使用ECS Fargate迁移到Apache Airflow。

我们面临的问题很简单。我们有一个简单的DAG,其任务之一是与AWS中的某些外部服务进行通信(例如,从S3下载文件)。这是DAG的脚本:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator

from datetime import datetime, timedelta


# default arguments for each task
default_args = {
    'owner': 'thomas',
    'depends_on_past': False,
    'start_date': datetime(2015, 6, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}


dag = DAG('test_s3_download',
          default_args=default_args,
          schedule_interval=None) 

TEST_BUCKET = 'bucket-dev'
TEST_KEY = 'BlueMetric/dms.json'


# simple download task
def download_file(bucket, key):
    import boto3
    s3 = boto3.resource('s3')
    print(s3.Object(bucket, key).get()['Body'].read())


download_from_s3 = PythonOperator(
    task_id='download_from_s3',
    python_callable=download_file,
    op_kwargs={'bucket': TEST_BUCKET, 'key': TEST_KEY},
    dag=dag)


sleep_task = BashOperator(
    task_id='sleep_for_1',
    bash_command='sleep 1',
    dag=dag)


download_from_s3.set_downstream(sleep_task)

就像其他时候使用docker一样,我们在docker容器中的~/.aws文件中创建config,该文件的内容为:

[default]
region = eu-west-1

并且只要容器在AWS边界之内,它就可以解决每个请求,而无需指定凭证。

这是我们正在使用的Dockerfile

FROM puckel/docker-airflow:1.10.7

USER root

COPY entrypoint.sh /entrypoint.sh
COPY requirements.txt /requirements.txt

RUN apt-get update

RUN ["chmod", "+x", "/entrypoint.sh"]

RUN mkdir -p /home/airflow/.aws \
&& touch /home/airflow/.aws/config \
&& echo '[default]' > /home/airflow/.aws/config \
&& echo 'region = eu-west-1' >> /home/airflow/.aws/config

RUN ["chown", "-R", "airflow", "/home/airflow"]

USER airflow

ENTRYPOINT ["/entrypoint.sh"]

# # Expose webUI and flower respectively
EXPOSE 8080
EXPOSE 5555

,一切都像个魅力。目录和所有者更改已成功完成,但是在运行DAG时失败,提示:

...
...
File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
[2020-08-24 11:15:02,125] {{taskinstance.py:1117}} INFO - All retries failed; marking task as FAILED

因此,我们认为Airflow的工作节点确实使用了另一个用户。

你们中有人知道发生了什么吗?感谢您提供的任何建议/建议。

1 个答案:

答案 0 :(得分:0)

为任务定义创建正确的task_role_arn。此角色是容器内部触发的进程承担的角色。另一个注释是该错误不应读取:

Unable to locate credentials

误导的

但是

Access Denied: you don't have permission to s3:GetObject