从先前任务返回的气流任务参数

时间:2017-07-05 18:00:53

标签: python parallel-processing airflow apache-airflow

如何将函数参数设置为从上一个运行的任务/函数返回的任务。请注意,这些任务是以编程方式定义的,因此我不能简单地使用xcom_pull(task_id="some_task"),因为任务是在循环中定义的(如下所示):

def scrape(site):
    return requests.get(site).content

def echo(**kwargs):
    ti = kwargs['ti']
    # how can I use the input from `scrape()` here?

for idx, site in enumerate(sites):
    myop = PythonOperator(task_id='scape_%s' % str(idx), python_callable=scrape, op_args=[site], dag=dag)
    echo_op = PythonOperator(task_id='echo_%s' % str(idx), dag=dag, provide_context=True, python_callable=echo)
    myop.set_downstream(echo_op)

1 个答案:

答案 0 :(得分:0)

缺少的只是传递所需的 task_id 以便您能够读取 xcom 值

def scrape(site):
    return requests.get(site).content

def echo(**kwargs):
    ti = kwargs['ti']
    myop_task_id_value = kwargs['myop_task_id_key']
    x = ti.xcom_pull(task_ids=myop_task_id_value) #pulling xcom value of scrape
    print(x)


for idx, site in enumerate(sites):
    myop_task_id = 'scape_%s' % str(idx)
    myop = PythonOperator(task_id= myop_task_id, python_callable=scrape, op_args=[site], dag=dag)
    echo_op = PythonOperator(task_id='echo_%s' % str(idx),
                             dag=dag,
                             provide_context=True,
                             python_callable=echo,
                             op_kwargs={'myop_task_id_key': myop_task_id}
                             )
    myop.set_downstream(echo_op)