我正在尝试使用Apache Airflow来创建工作流程。所以基本上我已经在我自己的服务器中的anaconda内核中手动安装了Airflow。
这是我运行简单DAG的方式
export AIRFLOW_HOME=~/airflow/airflow_home # my airflow home
export AIRFLOW=~/.conda/.../lib/python2.7/site-packages/airflow/bin
export PATH=~/.conda/.../bin:$AIRFLOW:$PATH # my kernel
当我使用气流测试做同样的事情时,它独立地为特定任务工作。例如,在dag1:task1>> TASK2
airflow test dag1 task2 2017-06-22
我想它先运行task1然后运行task2。但它只是独立运行task2。
你们对此有什么想法吗?非常感谢你提前!
这是我的代码:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'txuantu',
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['tran.xuantu@axa.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG(
'tutorial', default_args=default_args, schedule_interval=timedelta(1))
def python_op1(ds, **kwargs):
print(ds)
return 0
def python_op2(ds, **kwargs):
print(str(kwargs))
return 0
# t1, t2 and t3 are examples of tasks created by instantiating operators
# t1 = BashOperator(
# task_id='bash_operator',
# bash_command='echo {{ ds }}',
# dag=dag)
t1 = PythonOperator(
task_id='python_operator1',
python_callable=python_op1,
# provide_context=True,
dag=dag)
t2 = PythonOperator(
task_id='python_operator2',
python_callable=python_op2,
# provide_context=True,
dag=dag)
t2.set_upstream(t1)
气流:v1.8.0 使用带有SQLLite的执行程序SequentialExecutor
airflow run tutorial python_operator2 2015-06-01
以下是错误消息:
[2017-06-28 22:49:15,336] {models.py:167} INFO - Filling up the DagBag from /home/txuantu/airflow/airflow_home/dags
[2017-06-28 22:49:16,069] {base_executor.py:50} INFO - Adding to queue: airflow run tutorial python_operator2 2015-06-01T00:00:00 --mark_success --local -sd DAGS_FOLDER/tutorial.py
[2017-06-28 22:49:16,072] {sequential_executor.py:40} INFO - Executing command: airflow run tutorial python_operator2 2015-06-01T00:00:00 --mark_success --local -sd DAGS_FOLDER/tutorial.py
[2017-06-28 22:49:16,765] {models.py:167} INFO - Filling up the DagBag from /home/txuantu/airflow/airflow_home/dags/tutorial.py
[2017-06-28 22:49:16,986] {base_task_runner.py:112} INFO - Running: ['bash', '-c', u'airflow run tutorial python_operator2 2015-06-01T00:00:00 --mark_success --job_id 1 --raw -sd DAGS_FOLDER/tutorial.py']
[2017-06-28 22:49:17,373] {base_task_runner.py:95} INFO - Subtask: [2017-06-28 22:49:17,373] {__init__.py:57} INFO - Using executor SequentialExecutor
[2017-06-28 22:49:17,694] {base_task_runner.py:95} INFO - Subtask: [2017-06-28 22:49:17,693] {models.py:167} INFO - Filling up the DagBag from /home/txuantu/airflow/airflow_home/dags/tutorial.py
[2017-06-28 22:49:17,899] {base_task_runner.py:95} INFO - Subtask: [2017-06-28 22:49:17,899] {models.py:1120} INFO - Dependencies not met for <TaskInstance: tutorial.python_operator2 2015-06-01 00:00:00 [None]>, dependency 'Trigger Rule' FAILED: Task's trigger rule 'all_success' requires all upstream tasks to have succeeded, but found 1 non-success(es). upstream_tasks_state={'successes': 0, 'failed': 0, 'upstream_failed': 0, 'skipped': 0, 'done': 0}, upstream_task_ids=['python_operator1']
[2017-06-28 22:49:22,011] {jobs.py:2083} INFO - Task exited with return code 0
答案 0 :(得分:0)
如果您只想运行python_operator2,则应执行:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'revolution',
'USER': get_env_setting('POSTGRES_USER'),
'PASSWORD': get_env_setting('POSTGRES_PASSWORD'),
'HOST': 'dockerhost',
'PORT': 5432,
}
}
如果要执行整个dag并执行两个任务,请使用trigger_dag:
airflow run tutorial python_operator2 2015-06-01 --ignore_dependencies=False
作为参考,airflow trigger_dag tutorial
将&#34;运行任务而不检查依赖项。&#34;
答案 1 :(得分:0)
最后,我找到了解决问题的答案。基本上我认为气流是懒惰负荷,但似乎没有。所以答案是:而不是:
t2.set_upstream(t1)
应该是:
t1.set_downstream(t2)