我们正在使用ECS Fargate迁移到Apache Airflow。
我们面临的问题很简单。我们有一个简单的DAG,其任务之一是与AWS中的某些外部服务进行通信(例如,从S3下载文件)。这是DAG的脚本:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
# default arguments for each task
default_args = {
'owner': 'thomas',
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'retries': 1,
'retry_delay': timedelta(minutes=1),
}
dag = DAG('test_s3_download',
default_args=default_args,
schedule_interval=None)
TEST_BUCKET = 'bucket-dev'
TEST_KEY = 'BlueMetric/dms.json'
# simple download task
def download_file(bucket, key):
import boto3
s3 = boto3.resource('s3')
print(s3.Object(bucket, key).get()['Body'].read())
download_from_s3 = PythonOperator(
task_id='download_from_s3',
python_callable=download_file,
op_kwargs={'bucket': TEST_BUCKET, 'key': TEST_KEY},
dag=dag)
sleep_task = BashOperator(
task_id='sleep_for_1',
bash_command='sleep 1',
dag=dag)
download_from_s3.set_downstream(sleep_task)
就像其他时候使用docker一样,我们在docker容器中的~/.aws
文件中创建config
,该文件的内容为:
[default]
region = eu-west-1
并且只要容器在AWS边界之内,它就可以解决每个请求,而无需指定凭证。
这是我们正在使用的Dockerfile
:
FROM puckel/docker-airflow:1.10.7
USER root
COPY entrypoint.sh /entrypoint.sh
COPY requirements.txt /requirements.txt
RUN apt-get update
RUN ["chmod", "+x", "/entrypoint.sh"]
RUN mkdir -p /home/airflow/.aws \
&& touch /home/airflow/.aws/config \
&& echo '[default]' > /home/airflow/.aws/config \
&& echo 'region = eu-west-1' >> /home/airflow/.aws/config
RUN ["chown", "-R", "airflow", "/home/airflow"]
USER airflow
ENTRYPOINT ["/entrypoint.sh"]
# # Expose webUI and flower respectively
EXPOSE 8080
EXPOSE 5555
,一切都像个魅力。目录和所有者更改已成功完成,但是在运行DAG时失败,提示:
...
...
File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
[2020-08-24 11:15:02,125] {{taskinstance.py:1117}} INFO - All retries failed; marking task as FAILED
因此,我们认为Airflow的工作节点确实使用了另一个用户。
你们中有人知道发生了什么吗?感谢您提供的任何建议/建议。
答案 0 :(得分:0)
为任务定义创建正确的task_role_arn
。此角色是容器内部触发的进程承担的角色。另一个注释是该错误不应读取:
Unable to locate credentials
但是
Access Denied: you don't have permission to s3:GetObject
。