我在UTC + 4时区,所以当Airflow触发夜间ETL时,它已经是凌晨4点了。如何告知Airflow在20:00的ds-1日触发日ds的运行,但ds = ds?
根据文档,强烈建议将所有服务器保留在UTC上,这就是我正在寻找应用程序级解决方案的原因。
编辑:一个hacky解决方案是将其定义为每天晚上20:00运行,因此是“前一天”,然后在作业中使用tomorrow_ds
而不是ds
。但是在Airflow UI上看起来仍然很奇怪,因为这将显示UTC执行时间。
答案 0 :(得分:6)
计划间隔也可以是“cron表达式”,这意味着您可以在20:00 UTC轻松运行它。加上“user_defined_filters”意味着你可以通过一些技巧获得你想要的行为:
DROP TABLE temp;
CREATE TABLE temp
(ID INT AUTO_INCREMENT primary key,
sn VARCHAR(50)
);
DESCRIBE temp;
LOAD DATA LOCAL INFILE '/Users/...temp.csv' INTO TABLE temp
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
IGNORE 1 LINES
(@null, @null, @null, @null, @null, @null, @null, @null, @null,
@null, @null, @null, @null, @null, @null, @null, @null, @null,
@null, @null, @null, @null, @null, @null, @null, @null, @null,
@null, @null, @null, @null, @null, @null, @null, @null, @null,
@null, @null, @null, @null, @null, @null, @null, @null, @null,
@null, @null, @null, @null, @col50)
set sn=@col50;
SELECT * FROM temp;
输出:
UTC 2017-11-08T20:00:00,本地时间2017-11-09 00:00:00 + 04:00下一个2017-11-10 00:00:00 + 04:00
您必须小心使用的变量“类型”。例如from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime
import pytz
tz = pytz.timezone('Asia/Dubai')
def localize_utc_tz(d):
return tz.fromutc(d)
default_args = {
'start_date': datetime(2017, 11, 8),
}
dag = DAG(
'plus_4_utc',
default_args=default_args,
schedule_interval='0 20 * * *',
user_defined_filters={
'localtz': localize_utc_tz,
},
)
task = BashOperator(
task_id='task_for_testing_file_log_handler',
dag=dag,
bash_command='echo UTC {{ ts }}, Local {{ execution_date | localtz }} next {{ next_execution_date | localtz }}',
)
和ds
是字符串,而不是日期时间对象,这意味着过滤器不会对它们起作用
答案 1 :(得分:1)
我遇到了同样的问题。我每天有一个小时半小时的工作。
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
import pendulum
local_tz = pendulum.timezone("Asia/Calcutta")
args = {
'owner': 'ganesh',
'depends_on_past': False,
'start_date': datetime(2020, 3, 25, tzinfo=local_tz),
'email': ['abcd@test.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
dag_id='test1',
default_args=args,
schedule_interval='30 00 * * *'
)
first_date = BashOperator(
task_id='first_date'
,
bash_command='date'
, dag=dag, env=None, output_encoding='utf-8')
second_date = BashOperator(
task_id='second_date'
,
bash_command='echo date'
, dag=dag, env=None, output_encoding='utf-8')
first_date >> second_date
答案 2 :(得分:0)
您可以编写一个python util,将基于tz的计划重写为UTC吗? https://github.com/bloomberg/tzcron/blob/master/tzcron.py
编辑:最近的提交使Airflow Timezone意识到: https://github.com/apache/incubator-airflow/commit/f1ab56cc6ad3b9419af94aaa333661c105185883