使用Airflow v1.9 Python操作员的问题

时间:2018-01-25 10:34:43

标签: python jinja2 airflow

我已经在我的气流dag中编写了以下代码,我已经在本地测试了我的脚本,它就像一个梦想。现在我试图让它在我的气流dag中工作,但没有得到任何运气,我尝试过多次但无济于事

def fill_nulls (ds,file_in):
      csv_file = glob.glob(os.path.join(r'/tmp/', file_in))
      df = pd.read_csv(csv_file, sep='\t',header=None,error_bad_lines=False, index_col=False, dtype='unicode')
      df = df.fillna(r'\N')
      df.loc[:,df.dtypes==object].apply(lambda s:s.str.replace(" ", r'\N'))
      df.to_csv(csv_file,sep='\t',header=None,index=False, quoting=csv.QUOTE_NONE)


    fill_nulls = PythonOperator(
      task_id='fill_nulls',
      python_callable=fill_nulls,
      provide_context=True,
      templates_dict = {'file_in':'apollo_export_{{macros.ds_format(macros.ds_add( ds, -2),\'%Y-%m-%d\',\'%Y%m%d\')}}.csv'},
      dag=dag

    ) 

我收到以下错误:

: Traceback (most recent call last):
[2018-01-25 10:11:50,016] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/bin/airflow", line 27, in <module>
[2018-01-25 10:11:50,017] {base_task_runner.py:98} INFO - Subtask:     args.func(args)
[2018-01-25 10:11:50,017] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
[2018-01-25 10:11:50,017] {base_task_runner.py:98} INFO - Subtask:     pool=args.pool,
[2018-01-25 10:11:50,018] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-01-25 10:11:50,018] {base_task_runner.py:98} INFO - Subtask:     result = func(*args, **kwargs)
[2018-01-25 10:11:50,019] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1493, in _run_raw_task
[2018-01-25 10:11:50,019] {base_task_runner.py:98} INFO - Subtask:     result = task_copy.execute(context=context)
[2018-01-25 10:11:50,020] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 89, in execute
[2018-01-25 10:11:50,020] {base_task_runner.py:98} INFO - Subtask:     return_value = self.execute_callable()
[2018-01-25 10:11:50,020] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 94, in execute_callable
[2018-01-25 10:11:50,021] {base_task_runner.py:98} INFO - Subtask:     return self.python_callable(*self.op_args, **self.op_kwargs)
[2018-01-25 10:11:50,021] {base_task_runner.py:98} INFO - Subtask: TypeError: fill_nulls() got an unexpected keyword argument 'next_execution_date'

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:5)

我认为你错过了函数定义中的kwargs。

def fill_nulls(ds, file_in, **kwargs):