我知道之前曾有人问过这个问题,但没有一个答案可以回答。我开始有点疯狂! 我很困惑,非常感谢您的帮助。
我有一个带有python运算符的DAG,该运算符运行SQL查询并输出到.csv。第二个运算符只返回true即可生成DAG。我似乎无法在函数中访问ds变量。我想这样做是为了传递给查询。
from airflow.models import Variable, DAG
from airflow.hooks import HttpHook, PostgresHook
from airflow.operators import PythonOperator
from datetime import datetime, timedelta
import json
sql_path = Variable.get("sql_path")
date = Variable.get("ds")
first_date = Variable.get("ds")
print date
def get_redshift_data(ds,**kwargs):
pg_hook = PostgresHook(postgres_conn_id='redshift')
params = {'window_start_date':date,'window_end_date':first_date}
with open(sql_path+"/native.sql") as f:
sql_file = f.read() % (params)
df2 = pg_hook.get_pandas_df(sql_file)
df2.to_csv("test_1.csv", encoding = "utf-8")
def print_test(ds, **kwargs):
return True
args = {
'owner': 'Bob',
'depends_on_past': False,
'start_date': datetime.utcnow(),
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
#Define DAG
dag = DAG(dag_id='native_etl',
default_args=args,
schedule_interval='0 * * * 1,2,3,4,5',
dagrun_timeout=timedelta(seconds=30))
#Task 1 run native query with date parameters and output to file
get_redshift_native = PythonOperator(
task_id='native_etl',
provide_context=True,
python_callable=get_redshift_data,
dag=dag
)
#Task 2 print test
get_test = PythonOperator(
task_id='native_test',
provide_context=True,
python_callable=print_test,
dag=dag
)
get_redshift_native >> get_test
if __name__ == "__main__":
dag.cli()
查看日志时,我得到以下信息,
raise KeyError('Variable {} does not exist'.format(key))`
我还尝试通过操作员内外的kwargs [“ ds”]和{{ds}}访问变量。
查询很好,并且包含模板文本:
WHERE trunc(pd.server_timestamp) between '%(window_start_date)s' AND '%(window_end_date)s'
答案 0 :(得分:3)
您应该使用template_dict
来传递PythonOperator中的ds
模板。
https://github.com/apache/incubator-airflow/blob/master/airflow/operators/python_operator.py#L56
例如,如果我想将execution_date
传递给PythonOperator:
def transform_py(**kwargs):
today = kwargs.get('templates_dict').get('today', None)
...
with dag:
today = "{{ ds_nodash }}"
transform = PythonOperator(
task_id='test_date',
python_callable=transform_py,
templates_dict={
'today': today,
},
provide_context=True)