Airflow中的管道是否可能与任何计划无关?

时间:2018-05-23 22:06:11

标签: airflow google-cloud-composer

我需要有可以手动或以编程方式执行的管道,可以使用Airflow吗?现在看来每个工作流程都必须与时间表挂钩。

3 个答案:

答案 0 :(得分:4)

创建DAG时,只需将schedule_interval设置为None

dag = DAG('workflow_name',
          template_searchpath='path',
          schedule_interval=None,
          default_args=default_args)

来自Airflow Manual

  

每个DAG可能有也可能没有时间表,这可以通知DAG如何运行   被创造了。 schedule_interval定义为DAG参数,和   优选地接收作为str或c的cron表达式   datetime.timedelta对象。

然后手册继续列出一些cron预设'其中一个是None

答案 1 :(得分:2)

在Airflow中,每个DAG都需要具有开始日期和计划间隔*,例如每小时:

import datetime

dag = DAG(
    dag_id='my_dag',
    schedule_interval=datetime.timedelta(hours=1),
    start_date=datetime(2018, 5, 23),
)

(没有时间表如何知道何时开始运行?)

除了cron计划,您可以将计划设置为@once仅运行一次。

*一个例外:您可以省略externally triggered DAGs的日程安排,因为Airflow不会自行安排它们。

但是,如果你省略了时间表,那么你需要以某种方式从外部触发DAG。如果您希望能够以编程方式调用DAG,例如,由于另一个DAG中出现单独的条件,您可以使用TriggerDagRunOperator执行此操作。您可能也会听到这种被称为外部触发的DAG的想法。

以下是Airflow示例DAG的使用示例:

文件1 - example_trigger_controller_dag.py

"""This example illustrates the use of the TriggerDagRunOperator. There are 2
entities at work in this scenario:
1. The Controller DAG - the DAG that conditionally executes the trigger
2. The Target DAG - DAG being triggered (in example_trigger_target_dag.py)

This example illustrates the following features :
1. A TriggerDagRunOperator that takes:
  a. A python callable that decides whether or not to trigger the Target DAG
  b. An optional params dict passed to the python callable to help in
     evaluating whether or not to trigger the Target DAG
  c. The id (name) of the Target DAG
  d. The python callable can add contextual info to the DagRun created by
     way of adding a Pickleable payload (e.g. dictionary of primitives). This
     state is then made available to the TargetDag
2. A Target DAG : c.f. example_trigger_target_dag.py
"""

from airflow import DAG
from airflow.operators.dagrun_operator import TriggerDagRunOperator
from datetime import datetime

import pprint

pp = pprint.PrettyPrinter(indent=4)


def conditionally_trigger(context, dag_run_obj):
    """This function decides whether or not to Trigger the remote DAG"""
    c_p = context['params']['condition_param']
    print("Controller DAG : conditionally_trigger = {}".format(c_p))
    if context['params']['condition_param']:
        dag_run_obj.payload = {'message': context['params']['message']}
        pp.pprint(dag_run_obj.payload)
        return dag_run_obj


# Define the DAG
dag = DAG(dag_id='example_trigger_controller_dag',
          default_args={"owner": "airflow",
                        "start_date": datetime.utcnow()},
          schedule_interval='@once')


# Define the single task in this controller example DAG
trigger = TriggerDagRunOperator(task_id='test_trigger_dagrun',
                                trigger_dag_id="example_trigger_target_dag",
                                python_callable=conditionally_trigger,
                                params={'condition_param': True,
                                        'message': 'Hello World'},
                                dag=dag)

文件2 - example_trigger_target_dag.py

from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from airflow.models import DAG
from datetime import datetime

import pprint
pp = pprint.PrettyPrinter(indent=4)

# This example illustrates the use of the TriggerDagRunOperator. There are 2
# entities at work in this scenario:
# 1. The Controller DAG - the DAG that conditionally executes the trigger
#    (in example_trigger_controller.py)
# 2. The Target DAG - DAG being triggered
#
# This example illustrates the following features :
# 1. A TriggerDagRunOperator that takes:
#   a. A python callable that decides whether or not to trigger the Target DAG
#   b. An optional params dict passed to the python callable to help in
#      evaluating whether or not to trigger the Target DAG
#   c. The id (name) of the Target DAG
#   d. The python callable can add contextual info to the DagRun created by
#      way of adding a Pickleable payload (e.g. dictionary of primitives). This
#      state is then made available to the TargetDag
# 2. A Target DAG : c.f. example_trigger_target_dag.py

args = {
    'start_date': datetime.utcnow(),
    'owner': 'airflow',
}

dag = DAG(
    dag_id='example_trigger_target_dag',
    default_args=args,
    schedule_interval=None)


def run_this_func(ds, **kwargs):
    print("Remotely received value of {} for key=message".
          format(kwargs['dag_run'].conf['message']))


run_this = PythonOperator(
    task_id='run_this',
    provide_context=True,
    python_callable=run_this_func,
    dag=dag)


# You can also access the DagRun object in templates
bash_task = BashOperator(
    task_id="bash_task",
    bash_command='echo "Here is the message: '
                 '{{ dag_run.conf["message"] if dag_run else "" }}" ',
    dag=dag)

答案 2 :(得分:2)

是的,这可以通过将None传递到schedule_interval中的default_args来实现。

检查DAG运行的this文档。

例如:

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2015, 12, 1),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    'schedule_interval': None, # Check this line 
}