如何在当地时间午夜而不是午夜UTC时间触发每日DAG

时间:2017-11-04 15:24:56

标签: airflow apache-airflow airflow-scheduler

我在UTC + 4时区,所以当Airflow触发夜间ETL时,它已经是凌晨4点了。如何告知Airflow在20:00的ds-1日触发日ds的运行,但ds = ds?

根据文档,强烈建议将所有服务器保留在UTC上,这就是我正在寻找应用程序级解决方案的原因。

编辑:一个hacky解决方案是将其定义为每天晚上20:00运行,因此是“前一天”,然后在作业中使用tomorrow_ds而不是ds。但是在Airflow UI上看起来仍然很奇怪,因为这将显示UTC执行时间。

3 个答案:

答案 0 :(得分:6)

计划间隔也可以是“cron表达式”,这意味着您可以在20:00 UTC轻松运行它。加上“user_defined_filters”意味着你可以通过一些技巧获得你想要的行为:

DROP TABLE temp;

CREATE TABLE temp
(ID          INT               AUTO_INCREMENT          primary key,
 sn           VARCHAR(50)
);

DESCRIBE temp;

LOAD DATA LOCAL INFILE '/Users/...temp.csv' INTO TABLE temp
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
IGNORE 1 LINES
(@null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,
 @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,
 @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,
 @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,
 @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,  @null,
 @null,  @null,  @null,  @null,  @col50)
set sn=@col50;

SELECT * FROM temp;

输出:

  

UTC 2017-11-08T20:00:00,本地时间2017-11-09 00:00:00 + 04:00下一个2017-11-10 00:00:00 + 04:00

您必须小心使用的变量“类型”。例如from airflow.models import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime import pytz tz = pytz.timezone('Asia/Dubai') def localize_utc_tz(d): return tz.fromutc(d) default_args = { 'start_date': datetime(2017, 11, 8), } dag = DAG( 'plus_4_utc', default_args=default_args, schedule_interval='0 20 * * *', user_defined_filters={ 'localtz': localize_utc_tz, }, ) task = BashOperator( task_id='task_for_testing_file_log_handler', dag=dag, bash_command='echo UTC {{ ts }}, Local {{ execution_date | localtz }} next {{ next_execution_date | localtz }}', ) ds是字符串,而不是日期时间对象,这意味着过滤器不会对它们起作用

答案 1 :(得分:1)

我遇到了同样的问题。我每天有一个小时半小时的工作。

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
import pendulum

local_tz = pendulum.timezone("Asia/Calcutta")

args = {
    'owner': 'ganesh',
    'depends_on_past': False,
    'start_date': datetime(2020, 3, 25, tzinfo=local_tz),
    'email': ['abcd@test.com'],
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(
    dag_id='test1',
    default_args=args,
    schedule_interval='30 00 * * *'
    )

first_date = BashOperator(
    task_id='first_date'
    ,
    bash_command='date'
    , dag=dag, env=None, output_encoding='utf-8')

second_date = BashOperator(
    task_id='second_date'
    ,
    bash_command='echo date'
    , dag=dag, env=None, output_encoding='utf-8')

first_date >> second_date



答案 2 :(得分:0)

您可以编写一个python util,将基于tz的计划重写为UTC吗? https://github.com/bloomberg/tzcron/blob/master/tzcron.py

编辑:最近的提交使Airflow Timezone意识到: https://github.com/apache/incubator-airflow/commit/f1ab56cc6ad3b9419af94aaa333661c105185883