airflow TriggerDagRunOperator如何更改执行日期

时间:2017-12-14 08:02:21

标签: triggers airflow

我注意到,对于计划任务,执行日期是根据

设置的
  

Airflow是作为ETL需求的解决方案而开发的。在ETL世界中,   您通常会汇总数据。所以,如果我想总结一下数据   2016-02-19,我会在格林尼治标准时间2016-02-20午夜进行,这将是   在2016-02-19的所有数据可用之后。

然而,当dag触发另一个dag时,执行时间设置为now()。

有没有办法让触发的dags具有相同的触发dag的执行时间?当然,我可以重写模板并使用yesterday_ds,但是,这是一个棘手的解决方案。

4 个答案:

答案 0 :(得分:3)

以下类扩展 File "C:\Anaconda64\lib\cookielib.py", line 1643, in set_cookie if cookie.domain not in c: c[cookie.domain] = {} AttributeError: 'dict' object has no attribute 'domain' Process terminated with an exit code of 1 cookies = [{u'domain': u'.youtube.com', u'name': u'YSC', u'value': u'3IWrNKWSA4M', u'path': u'/', u'httpOnly': True, u'secure': False}, {u'domain': u'.youtube.com', u'secure': False, u'value': u'f1=50000000&al=en-GB', u'expiry': 1536214030.209859, u'path': u'/', u'httpOnly': False, u'name': u'PREF'}, {u'domain': u'.youtube.com', u'secure': False, u'value': u'unLb9EHXNoE', u'expiry': 1536214005.088592, u'path': u'/', u'httpOnly': True, u'name': u'VISITOR_INFO1_LIVE'}, {u'domain': u'.youtube.com', u'secure': False, u'value': u'nAVyIaZNvcjbaoQmU3Klp29Qbg-XUTcXVl--AQaUVNxBxafNdvqF6UalvF-XxI7tFZC9VQ.', u'expiry': 1578248049.22749, u'path': u'/', u'httpOnly': False, u'name': u'SID'}, {u'domain': u'.youtube.com', u'secure': False, u'value': u'XeYri988Qxz95x0C/AKhEa472x1OLAenvb', u'expiry': 1578248049.227637, u'path': u'/', u'httpOnly': False, u'name': u'APISID'}, {u'domain': u'.youtube.com', u'secure': True, u'value': u'A6qerdAIT_-5_TnPL', u'expiry': 1578248049.227601, u'path': u'/', u'httpOnly': True, u'name': u'SSID'}, {u'domain': u'.youtube.com', u'secure': False, u'value': u'AxLIfhQzeMb-JVHvW', u'expiry': 1578248049.227563, u'path': u'/', u'httpOnly': True, u'name': u'HSID'}, {u'domain': u'.youtube.com', u'secure': True, u'value': u'GrFiuZaX0jrP5xNx/ApjqMj4RLR92qxyaC', u'expiry': 1578248049.227671, u'path': u'/', u'httpOnly': False, u'name': u'SAPISID'}, {u'domain': u'.youtube.com', u'secure': False, u'value': u'AFmmF2swRgIhAMgj-IZmcAcrW4cm--wm4Mwb_PRnN8sMV3d7sNilw2jyAiEA1QTRmjDEqkVzdK4k6sR4gJGlYQD11aqK11cw_dLUuHQ:QUQ3MjNmeThkM1V3SlBQNFplZkNaVk9kUnNHazhCczc0NmFQZ2htVTVUVk5JWDdkWGxGdjN2blhzWGxzZ3JsSzViZkNWemlocVZzMlpXRVN1OHJHNmxmWFR5cHktUnJfb1poR1FXTzhpUzhBWURJX3JZaHByUFdkeS05cHNaUGN6RWFtMlJZTTBfLUJxcGVZUUQyTDNtcGpYUHZlVUdVQXlxTjZiVjhuQ0x2a29maTFCU2RZaWlj', u'expiry': 1578248049.942557, u'path': u'/', u'httpOnly': True, u'name': u'LOGIN_INFO'}] from cookiestxt import MozillaCookieJar cookie_file = "cookies.txt" cj = MozillaCookieJar("cookies.txt") for cookie in cookies: cj.set_cookie(cookie) cj.save(ignore_discard=True, ignore_expires=True) 以允许将执行日期作为字符串传递,然后将其转换回日期时间。这有点蠢,但这是我找到完成工作的唯一方法。

TriggerDagRunOperator

使用此问题时可能会遇到一个问题,而不是设置from datetime import datetime import logging from airflow import settings from airflow.utils.state import State from airflow.models import DagBag from airflow.operators.dagrun_operator import TriggerDagRunOperator, DagRunOrder class MMTTriggerDagRunOperator(TriggerDagRunOperator): """ MMT-patched for passing explicit execution date (otherwise it's hard to hook the datetime.now() date). Use when you want to explicity set the execution date on the target DAG from the controller DAG. Adapted from Paul Elliot's solution on airflow-dev mailing list archives: http://mail-archives.apache.org/mod_mbox/airflow-dev/201711.mbox/%3cCAJuWvXgLfipPmMhkbf63puPGfi_ezj8vHYWoSHpBXysXhF_oZQ@mail.gmail.com%3e Parameters ------------------ execution_date: str the custom execution date (jinja'd) Usage Example: ------------------- my_dag_trigger_operator = MMTTriggerDagRunOperator( execution_date="{{execution_date}}" task_id='my_dag_trigger_operator', trigger_dag_id='my_target_dag_id', python_callable=lambda: random.getrandbits(1), params={}, dag=my_controller_dag ) """ template_fields = ('execution_date',) def __init__( self, trigger_dag_id, python_callable, execution_date, *args, **kwargs ): self.execution_date = execution_date super(MMTTriggerDagRunOperator, self).__init__( trigger_dag_id=trigger_dag_id, python_callable=python_callable, *args, **kwargs ) def execute(self, context): run_id_dt = datetime.strptime(self.execution_date, '%Y-%m-%d %H:%M:%S') dro = DagRunOrder(run_id='trig__' + run_id_dt.isoformat()) dro = self.python_callable(context, dro) if dro: session = settings.Session() dbag = DagBag(settings.DAGS_FOLDER) trigger_dag = dbag.get_dag(self.trigger_dag_id) dr = trigger_dag.create_dagrun( run_id=dro.run_id, state=State.RUNNING, execution_date=self.execution_date, conf=dro.payload, external_trigger=True) logging.info("Creating DagRun {}".format(dr)) session.add(dr) session.commit() session.close() else: logging.info("Criteria not met, moving on") :如果您尝试使用相同的execution_date=now()两次启动dag,则运算符将抛出mysql错误。这是因为execution_dateexecution_date用于创建行索引,并且无法插入具有相同索引的行。

我无法想到你原本想要在生产中使用相同的dag_id运行两个相同的dag的原因,但这是我在测试时遇到的问题,你不应该对它感到震惊。只需清除旧作业或使用不同的日期时间。

答案 1 :(得分:2)

TriggerDagRunOperator现在有一个execution_date参数来设置触发运行的执行日期。 不幸的是,参数不在模板字段中。 如果它将被添加到模板字段中(或者如果覆盖运算符并更改template_fields值),则可以像这样使用它:

my_trigger_task= TriggerDagRunOperator(task_id='my_trigger_task',
                                              trigger_dag_id="triggered_dag_id",
                                              python_callable=conditionally_trigger,
                                              execution_date= '{{execution_date}}',
                                              dag=dag)

尚未发布,但您可以在此处查看来源: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/dagrun_operator.py

进行更改的提交是: https://github.com/apache/incubator-airflow/commit/089c996fbd9ecb0014dbefedff232e8699ce6283#diff-41f9029188bd5e500dec9804fed26fb4

答案 2 :(得分:1)

我改进了一点MMTTriggerDagRunOperator。该函数检查dag_run是否已存在,如果找到,则使用气流的清除功能重新启动dag。这允许我们在dags之间创建依赖关系,因为将执行日期移动到触发的dag的可能性打开了一整套惊人的可能性。我想知道为什么这不是气流中的默认行为。

   def execute(self, context):
        run_id_dt = datetime.strptime(self.execution_date, '%Y-%m-%d %H:%M:%S')
        dro = DagRunOrder(run_id='trig__' + run_id_dt.isoformat())
        dro = self.python_callable(context, dro)
        if dro:
            session = settings.Session()
            dbag = DagBag(settings.DAGS_FOLDER)
            trigger_dag = dbag.get_dag(self.trigger_dag_id)

            if not trigger_dag.get_dagrun( self.execution_date ):
                dr = trigger_dag.create_dagrun(
                       run_id=dro.run_id,
                       state=State.RUNNING,
                       execution_date=self.execution_date,
                       conf=dro.payload,
                       external_trigger=True
                )
                logging.info("Creating DagRun {}".format(dr))
                session.add(dr)
                session.commit()
            else:
                trigger_dag.clear( 
                    start_date = self.execution_date,
                    end_date = self.execution_date,
                    only_failed = False,
                    only_running = False,
                    confirm_prompt = False, 
                    reset_dag_runs = True, 
                    include_subdags= False,
                    dry_run = False 
                )
                logging.info("Cleared DagRun {}".format(trigger_dag))

            session.close()
        else:
            logging.info("Criteria not met, moving on")

答案 3 :(得分:0)

气流的实验API部分提供了一个功能,允许您触发具有特定执行日期的dag。
https://github.com/apache/incubator-airflow/blob/master/airflow/api/common/experimental/trigger_dag.py

您可以将此功能称为 PythonOperator 的一部分,并实现目标。

所以它看起来像是 from airflow.api.common.experimental.trigger_dag import trigger_dag

trigger_operator=PythonOperator(task_id='YOUR_TASK_ID',
                                python_callable=trigger_dag,
                                op_args=['dag_id'],
                                op_kwargs={'execution_date': datetime.now()})