Question

我一直在追赶Airflow的困扰， {{3}}

在dag以上执行时，它将顺序运行以下顺序之一。

A-> B-> C1-> C2-> D1-> D2

A-> B-> C2-> C1-> D2-> D1

但是我的要求是同时运行C1和C2任务。我的airflow.cfg的一部分

declare -a arr
echo "-------------------------------------"
echo "Here another example with arr numeric"
echo "-------------------------------------"
arr=( 10 200 3000 40000 500000 60 700 8000 90000 100000 )

echo -e "\n Elements in arr are:\n ${arr[0]} \n ${arr[1]} \n ${arr[2]} \n ${arr[3]} \n ${arr[4]} \n ${arr[5]} \n ${arr[6]} \n ${arr[7]} \n ${arr[8]} \n ${arr[9]}"

echo -e " \n Total elements in arr are : ${arr[*]} \n"

echo -e " \n Total lenght of arr is : ${#arr[@]} \n"

for (( i=0; i<10; i++ ))
do      echo "The value in position $i for arr is [ ${arr[i]} ]"
done

for (( j=0; j<10; j++ ))
do      echo "The length in element $j is ${#arr[j]}"
done

for z in "${!arr[@]}"
do      echo "The key ID is $z"
done
~

Answer 1

在dag属性中添加concurrency = x（其中x的int大于1）。

max_active_runs是dag并发。并发就是任务并发。

示例：

    dag = DAG(
    dag_id,
    default_args=default_args,
    schedule_interval='00 03 * * *',
    max_active_runs=2,
    concurrency=2)

Answer 2

如果您仅在一台机器上进行测试，则建议使用LocalExecutor。 SequentialExecutor串行运行任务，而CeleryExecutor需要一堆消息代理的机器。

此外，当您使用LocalExecutor时，应使用与sqlite不同的元数据库，因为sqlite不支持并行读取。因此，您可以使用Postgres或MySQL并相应地更改sql_alchemy_conn文件中的airflow.cfg。

阅读：https://airflow.apache.org/howto/initialize-database.html

“ LocalExecutor”，可以在本地并行化任务实例的执行程序。

Answer 3

这似乎是一个配置问题。从您的配置中，我看到执行程序是CeleryExecutor。检查数据库和消息代理组件。

如果未将这些配置为并行运行，那么您的任务也不会并行运行。

如何在Apache Airflow中并行运行任务

3 个答案: