我正在使用boto2访问我的S3存储桶,而我无法通过Apache Airflow Scheduler获得一致的结果。我有一个脚本从S3获取我的存储桶并将zip文件上传到该存储桶。我在带有bash运算符的Airflow调度程序上运行脚本。当我在电脑上或手动运行脚本时,时间表似乎工作正常。我注意到当计算机空闲时脚本将抛出403 Forbidden错误。例如,我的脚本每天运行一次,并且将在一周内使用气流成功执行我的脚本,并且一旦周末启动,脚本将无法运行,并抛出403错误。知道为什么会出现这种行为模式吗?我的帐号应该具有S3的完全访问权限。我有一个从docker容器运行的气流。
以下是访问存储桶的脚本中的函数:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
schedule_interval="00 18 * * *"
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(year=2017, month=03, day=21, hour=18 ,minute=00, second=00),
'email_on_failure': False,
'email_on_retry': False,
'retries': 4,
'retry_delay': timedelta(minutes=2)
}
dag = DAG(
dag_id = 'appannie_bash_operator_v10',
default_args = args,
schedule_interval = schedule_interval
)
t1 = BashOperator(
task_id = 'get_json',
bash_command = 'python ~/appannie/appannie_scrape_update.py',
dag=dag,
)
t2 = BashOperator(
task_id = 'export_s3',
bash_command = 'python --version',
dag = dag,
)
t2.set_upstream(t1)
这是我正在运行的dag配置脚本:
[2017-04-04 21:50:28,996] {models.py:1219} INFO - Executing <Task(BashOperator): get_json> on 2017-04-03 18:00:00
[2017-04-04 21:50:29,156] {bash_operator.py:55} INFO - tmp dir root location:
/tmp
[2017-04-04 21:50:29,157] {bash_operator.py:64} INFO - Temporary script location :/tmp/airflowtmp5_nqrG//tmp/airflowtmp5_nqrG/get_json0aZng2
[2017-04-04 21:50:29,158] {bash_operator.py:65} INFO - Running command: python ~/appannie/appannie_scrape_update.py
[2017-04-04 21:50:29,162] {bash_operator.py:73} INFO - Output:
[2017-04-04 21:51:06,057] {bash_operator.py:77} INFO - JSON DOWNLOAD SUCCESSFUL!
[2017-04-04 21:51:06,368] {bash_operator.py:77} INFO - Traceback (most recent call last):
[2017-04-04 21:51:06,678] {bash_operator.py:77} INFO - File "/usr/local/airflow/appannie/appannie_scrape_update.py", line 244, in <module>
[2017-04-04 21:51:06,679] {bash_operator.py:77} INFO - upload_s3(x2)
[2017-04-04 21:51:06,680] {bash_operator.py:77} INFO - File "/usr/local/airflow/appannie/appannie_scrape_update.py", line 41, in upload_s3
[2017-04-04 21:51:06,680] {bash_operator.py:77} INFO - aabucket = s3.get_bucket('nixhydra-appannie',validate='False')
[2017-04-04 21:51:06,681] {bash_operator.py:77} INFO - File "/usr/local/airflow/.local/lib/python2.7/site-packages/boto/s3/connection.py", line 506, in get_bucket
[2017-04-04 21:51:06,681] {bash_operator.py:77} INFO - return self.head_bucket(bucket_name, headers=headers)
[2017-04-04 21:51:06,682] {bash_operator.py:77} INFO - File "/usr/local/airflow/.local/lib/python2.7/site-packages/boto/s3/connection.py", line 539, in head_bucket
[2017-04-04 21:51:06,682] {bash_operator.py:77} INFO - raise err
[2017-04-04 21:51:06,683] {bash_operator.py:77} INFO - boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
[2017-04-04 21:51:06,683] {bash_operator.py:77} INFO -
[2017-04-04 21:51:06,684] {bash_operator.py:80} INFO - Command exited with return code 1
[2017-04-04 21:51:06,685] {models.py:1286} ERROR - Bash command failed
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1245, in run
result = task_copy.execute(context=context)
File "/usr/local/lib/python2.7/dist-packages/airflow/operators/bash_operator.py", line 83, in execute
raise AirflowException("Bash command failed")
AirflowException: Bash command failed
[2017-04-04 21:51:06,686] {models.py:1298} INFO - Marking task as UP_FOR_RETRY
[2017-04-04 21:51:07,699] {models.py:1327} ERROR - Bash command failed
[2017-04-04 21:53:18,231] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags/appannie_bash_operator_v10.py
[2017-04-04 21:53:21,470] {models.py:154} INFO - Filling up the DagBag from /usr/local/airflow/dags/appannie_bash_operator_v10.py
[2017-04-04 21:53:21,515] {models.py:1196} INFO -
以下是我收到的日志错误:
{{1}}