如何解决气流错误:print_context()缺少1个必需的位置参数:“ ds”

时间:2018-11-26 16:39:28

标签: airflow directed-acyclic-graphs

我有如下问题: ingest_excel.py:

from __future__ import print_function

import time
from builtins import range
from datetime import timedelta
from pprint import pprint

import airflow
from airflow.models import DAG
#from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator

args = {
    'owner': 'rxie',
    'start_date': airflow.utils.dates.days_ago(2),
}

dag = DAG(
    dag_id='ingest_excel',
    default_args=args,
    schedule_interval='0 0 * * *',
    dagrun_timeout=timedelta(minutes=60),
)

def print_context(**kwargs):
    pprint("DAG info below:")
    pprint(kwargs)
    return 'Whatever you return gets printed in the logs'


t11_extract_excel_to_csv = PythonOperator(
    task_id='t1_extract_excel_to_csv',
    provide_context=True,
    python_callable=print_context(),
    op_kwargs=None,
    dag=dag,
)


t12_upload_csv_to_hdfs_parquet = PythonOperator(
    task_id='t12_upload_csv_to_hdfs_parquet',
    provide_context=True,
    python_callable=print_context(),
    op_kwargs=None,
    dag=dag,
)


t13_register_parquet_to_impala = PythonOperator(
    task_id='t13_register_parquet_to_impala',
    provide_context=True,
    python_callable=print_context(),
    op_kwargs=None,
    dag=dag,
)

t21_text_to_parquet = PythonOperator(
    task_id='t21_text_to_parquet',
    provide_context=True,
    python_callable=print_context(),
    op_kwargs=None,
    dag=dag,
)

t22_register_parquet_to_impala = PythonOperator(
    task_id='t22_register_parquet_to_impala',
    provide_context=True,
    python_callable=print_context(),
    op_kwargs=None,
    dag=dag,
)

t31_verify_completion = PythonOperator(
    task_id='t31_verify_completion',
    provide_context=True,
    python_callable=print_context(),
    op_kwargs=None,
    dag=dag,
)

t32_send_notification = PythonOperator(
    task_id='t32_send_notification',
    provide_context=True,
    python_callable=print_context(),
    op_kwargs=None,
    dag=dag,
)

t11_extract_excel_to_csv >> t12_upload_csv_to_hdfs_parquet
t12_upload_csv_to_hdfs_parquet >> t13_register_parquet_to_impala

t21_text_to_parquet >> t22_register_parquet_to_impala


t13_register_parquet_to_impala >> t31_verify_completion
t22_register_parquet_to_impala >> t31_verify_completion

t31_verify_completion >> t32_send_notification


#if __name__ == "__main__":
#    dag.cli()

在DAG GUI中,它提示:

  

损坏的DAG:[/ root / airflow / dags / ingest_excel.py] python_callable   参数必须是可调用的

这是我第一次涉足Airflow,对Airflow来说我还很陌生,如果有人可以给我一些启发并为我整理一下,将不胜感激。

谢谢。

3 个答案:

答案 0 :(得分:2)

我不完全确定为什么您的代码无法正常工作。它应该工作,但是下面给出了解决方法。

def print_context(**kwargs):
ds = kwargs['ds']

python_callable也应该这样传递

python_callable=print_context,

答案 1 :(得分:0)

要详细说明您的问题:由于未将函数print_context传递给PythonOperator,因此流程中断了,而是传递了调用{的 result 结果{1}}:

print_context

您的函数正在返回字符串[...] t32_send_notification = PythonOperator( task_id='t32_send_notification', provide_context=True, python_callable=print_context(), # <-- This is the issue. op_kwargs=None, dag=dag, ) [...] ,该字符串又被提供给'Whatever you return gets printed in the logs'关键字参数中的PythonOperator。气流本质上正在尝试执行以下操作:

python_callable

...并且您收到看到的错误。另一个贡献者指出您应该将your_return = 'Whatever you return gets printed in the logs' your_return() 关键字参数更改为简单的PythonOperator.python_callable

是正确的

答案 2 :(得分:0)

以下选项需要在较新版本的airflow中传递给PythonOperator:

provide_context=True

否则,ds参数不会传递给您的函数。这是我遇到的对Airflow的最新更改。

完整示例:

def print_context(ds, **kwargs):
    pprint(kwargs)
    print(ds)
    return 'Whatever you return gets printed in the logs'


run_this = PythonOperator(
    task_id='print_the_context',
    provide_context=True,
    python_callable=print_context,
    dag=dag,
)