函数中的气流pythonOperator ds变量

时间:2018-08-23 12:48:05

标签: python jinja2 amazon-redshift airflow

我知道之前曾有人问过这个问题,但没有一个答案可以回答。我开始有点疯狂! 我很困惑,非常感谢您的帮助。

我有一个带有python运算符的DAG,该运算符运行SQL查询并输出到.csv。第二个运算符只返回true即可生成DAG。我似乎无法在函数中访问ds变量。我想这样做是为了传递给查询。

from airflow.models import Variable, DAG
from airflow.hooks import HttpHook, PostgresHook
from airflow.operators import PythonOperator
from datetime import datetime, timedelta
import json


sql_path = Variable.get("sql_path")
date = Variable.get("ds")
first_date = Variable.get("ds")

print date

def get_redshift_data(ds,**kwargs):
    pg_hook = PostgresHook(postgres_conn_id='redshift')
    params = {'window_start_date':date,'window_end_date':first_date}
    with open(sql_path+"/native.sql") as f:
        sql_file = f.read() % (params)
    df2 = pg_hook.get_pandas_df(sql_file)
    df2.to_csv("test_1.csv", encoding = "utf-8")

def print_test(ds, **kwargs):
    return True

args = {
    'owner': 'Bob',
    'depends_on_past': False,
    'start_date': datetime.utcnow(),
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

#Define DAG
dag = DAG(dag_id='native_etl',
          default_args=args,
          schedule_interval='0 * * * 1,2,3,4,5',
          dagrun_timeout=timedelta(seconds=30))

#Task 1 run native query with date parameters and output to file


get_redshift_native = PythonOperator(
                      task_id='native_etl',
                      provide_context=True,
                      python_callable=get_redshift_data,
                      dag=dag
                      )

#Task 2 print test

get_test = PythonOperator(
                      task_id='native_test',
                      provide_context=True,
                      python_callable=print_test,
                      dag=dag
)

get_redshift_native >> get_test

if __name__ == "__main__":
    dag.cli()

查看日志时,我得到以下信息,

raise KeyError('Variable {} does not exist'.format(key))`

我还尝试通过操作员内外的kwargs [“ ds”]和{{ds}}访问变量。

查询很好,并且包含模板文本:

WHERE trunc(pd.server_timestamp) between '%(window_start_date)s' AND '%(window_end_date)s'

1 个答案:

答案 0 :(得分:3)

您应该使用template_dict来传递PythonOperator中的ds模板。

https://github.com/apache/incubator-airflow/blob/master/airflow/operators/python_operator.py#L56

例如,如果我想将execution_date传递给PythonOperator:

def transform_py(**kwargs):

    today = kwargs.get('templates_dict').get('today', None)
    ...

with dag:

    today = "{{ ds_nodash }}"

    transform = PythonOperator(
            task_id='test_date',
            python_callable=transform_py,
            templates_dict={
                'today': today,

            },
            provide_context=True)