我有一个虚拟的DAG,我想通过将其start_date
设置为today
并将其调度间隔设置为daily
这是DAG代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# -*- airflow: DAG -*-
import logging
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
logger = logging.getLogger("DummyDAG")
def execute_python_function():
logging.info("HEY YO !!!")
return True
dag = DAG(dag_id='dummy_dag',
start_date=datetime.today())
start = DummyOperator(task_id='start', dag=dag)
end = DummyOperator(task_id='end', dag=dag)
py_operator = PythonOperator(task_id='exec_function',
python_callable=execute_python_function,
dag=dag)
start >> py_operator >> end
在Airflow 1.9.0中,当我执行airflow trigger_dag -e 20190701
时,将创建DAG运行,然后创建,计划和执行任务实例。
但是,在Airflow 1.10.2中,也创建了DAG运行任务实例,但它们停留在None
状态。
对于两个版本,depends_on_past均为False
以下是Airflow 1.9.0中start
任务的详细信息(一段时间后成功执行)
Task Instance Details
Dependencies Blocking Task From Getting Scheduled
Dependency: Reason
Dagrun Running: Task instance's dagrun was not in the 'running' state but in the state 'success'.
Task Instance State: Task is in the 'success' state which is not a valid state for execution. The task must be cleared in order to be run.
Execution Date: The execution date is 2019-07-10T00:00:00 but this is before the task's start date 2019-07-11T08:45:18.230876.
Execution Date: The execution date is 2019-07-10T00:00:00 but this is before the task's DAG's start date 2019-07-11T08:45:18.230876.
Task instance attribute
Attribute Value
dag_id dummy_dag
duration None
end_date 2019-07-10 16:32:10.372976
execution_date 2019-07-10 00:00:00
generate_command <function generate_command at 0x7fc9fcc85b90>
hostname airflow-worker-5dc5b999b6-2l5cp
is_premature False
job_id None
key ('dummy_dag', 'start', datetime.datetime(2019, 7, 10, 0, 0))
log <logging.Logger object at 0x7fca014e7f10>
log_filepath /home/airflow/gcs/logs/dummy_dag/start/2019-07-10T00:00:00.log
log_url https://i39907f7014685e91-tp.appspot.com/admin/airflow/log?dag_id=dummy_dag&task_id=start&execution_date=2019-07-10T00:00:00
logger <logging.Logger object at 0x7fca014e7f10>
mark_success_url https://i39907f7014685e91-tp.appspot.com/admin/airflow/success?task_id=start&dag_id=dummy_dag&execution_date=2019-07-10T00:00:00&upstream=false&downstream=false
max_tries 0
metadata MetaData(bind=None)
next_try_number 2
operator None
pid 180712
pool None
previous_ti None
priority_weight 3
queue default
queued_dttm None
run_as_user None
start_date 2019-07-10 16:32:08.483531
state success
task <Task(DummyOperator): start>
task_id start
test_mode False
try_number 2
unixname airflow
Task Attributes
Attribute Value
adhoc False
dag <DAG: dummy_dag>
dag_id dummy_dag
depends_on_past False
deps set([<TIDep(Not In Retry Period)>, <TIDep(Previous Dagrun State)>, <TIDep(Trigger Rule)>])
downstream_list [<Task(PythonOperator): exec_function>]
downstream_task_ids ['exec_function']
email None
email_on_failure True
email_on_retry True
end_date None
execution_timeout None
log <logging.Logger object at 0x7fc9e2085350>
logger <logging.Logger object at 0x7fc9e2085350>
max_retry_delay None
on_failure_callback None
on_retry_callback None
on_success_callback None
owner Airflow
params {}
pool None
priority_weight 1
priority_weight_total 3
queue default
resources {'disk': {'_qty': 512, '_units_str': 'MB', '_name': 'Disk'}, 'gpus': {'_qty': 0, '_units_str': 'gpu(s)', '_name': 'GPU'}, 'ram': {'_qty': 512, '_units_str': 'MB', '_name': 'RAM'}, 'cpus': {'_qty': 1, '_units_str': 'core(s)', '_name': 'CPU'}}
retries 0
retry_delay 0:05:00
retry_exponential_backoff False
run_as_user None
schedule_interval 1 day, 0:00:00
sla None
start_date 2019-07-11 08:45:18.230876
task_concurrency None
task_id start
task_type DummyOperator
template_ext []
template_fields ()
trigger_rule all_success
ui_color #e8f7e4
ui_fgcolor #000
upstream_list []
upstream_task_ids []
wait_for_downstream False
Airflow 1.10.2中启动任务的详细信息
Task Instance Details
Dependencies Blocking Task From Getting Scheduled
Dependency Reason
Execution Date The execution date is 2019-07-11T00:00:00+00:00 but this is before the task's start date 2019-07-11T08:53:32.593360+00:00.
Execution Date The execution date is 2019-07-11T00:00:00+00:00 but this is before the task's DAG's start date 2019-07-11T08:53:32.593360+00:00.
Task Instance Attributes
Attribute Value
dag_id dummy_dag
duration None
end_date None
execution_date 2019-07-11T00:00:00+00:00
executor_config {}
generate_command <function generate_command at 0x7f4621301578>
hostname
is_premature False
job_id None
key ('dummy_dag', 'start', <Pendulum [2019-07-11T00:00:00+00:00]>, 1)
log <logging.Logger object at 0x7f4624883350>
log_filepath /home/airflow/gcs/logs/dummy_dag/start/2019-07-11T00:00:00+00:00.log
log_url https://a15d189066a5c65ee-tp.appspot.com/admin/airflow/log?dag_id=dummy_dag&task_id=start&execution_date=2019-07-11T00%3A00%3A00%2B00%3A00
logger <logging.Logger object at 0x7f4624883350>
mark_success_url https://a15d189066a5c65ee-tp.appspot.com/admin/airflow/success?task_id=start&dag_id=dummy_dag&execution_date=2019-07-11T00%3A00%3A00%2B00%3A00&upstream=false&downstream=false
max_tries 0
metadata MetaData(bind=None)
next_try_number 1
operator None
pid None
pool None
previous_ti None
priority_weight 3
queue default
queued_dttm None
raw False
run_as_user None
start_date None
state None
task <Task(DummyOperator): start>
task_id start
test_mode False
try_number 1
unixname airflow
Task Attributes
Attribute Value
adhoc False
dag <DAG: dummy_dag>
dag_id dummy_dag
depends_on_past False
deps set([<TIDep(Previous Dagrun State)>, <TIDep(Trigger Rule)>, <TIDep(Not In Retry Period)>])
downstream_list [<Task(PythonOperator): exec_function>]
downstream_task_ids set(['exec_function'])
email None
email_on_failure True
email_on_retry True
end_date None
execution_timeout None
executor_config {}
inlets []
lineage_data None
log <logging.Logger object at 0x7f460b467e10>
logger <logging.Logger object at 0x7f460b467e10>
max_retry_delay None
on_failure_callback None
on_retry_callback None
on_success_callback None
outlets []
owner Airflow
params {}
pool None
priority_weight 1
priority_weight_total 3
queue default
resources {'disk': {'_qty': 512, '_units_str': 'MB', '_name': 'Disk'}, 'gpus': {'_qty': 0, '_units_str': 'gpu(s)', '_name': 'GPU'}, 'ram': {'_qty': 512, '_units_str': 'MB', '_name': 'RAM'}, 'cpus': {'_qty': 1, '_units_str': 'core(s)', '_name': 'CPU'}}
retries 0
retry_delay 0:05:00
retry_exponential_backoff False
run_as_user None
schedule_interval 1 day, 0:00:00
sla None
start_date 2019-07-11T08:53:32.593360+00:00
task_concurrency None
task_id start
task_type DummyOperator
template_ext []
template_fields ()
trigger_rule all_success
ui_color #e8f7e4
ui_fgcolor #000
upstream_list []
upstream_task_ids set([])
wait_for_downstream False
weight_rule downstream
答案 0 :(得分:1)
IMO,这不是版本问题。如果您查看日志,则会看到类似以下消息:
执行日期:
执行日期是2019-07-10T00:00:00但这是 任务开始日期之前的 。2019-07-11T08:45:18.230876。
执行日期是您在trigger_dag
命令中输入的日期,而DAG的开始日期正在更改,因为Python的datetime.today()
返回了当前时间。为此,您可以执行以下操作:
airflow@e3bc9a0a7a3e:~$ airflow trigger_dag dummy_dag -e 20190702
然后转到http://localhost:8080/admin/airflow/task?dag_id=dummy_dag&task_id=start&execution_date=2019-07-02T00%3A00%3A00%2B00%3A00(或任何相应的URL)并刷新页面。您应该每次都看到Dependency > Execution date
发生变化。
对于您而言,这将是有问题的,因为您尝试触发过去的DAG。更好的方法是指定一个静态日期或使用Airflow的util方法中的任何一种来解决该问题:
dag = DAG(dag_id='dummy_dag',
start_date=datetime(2019, 7, 11, 0, 0))
否则,如果要重新处理历史数据,可以使用airflow backfill
更新
在对注释进行了澄清之后,我们找到了另一种按需触发属性为schedule_interval=None
的DAG的方法。