DummyOperator标记为上游失败,而所有上游任务都标记为成功

时间:2019-02-13 13:00:35

标签: airflow google-cloud-composer

我有一个Airflow管道,该管道从Google Cloud Storage文件生成12个登台表,然后执行一些下游处理。在进行下一步之前,我有一个DummyOperator来收集这些任务。

我在wait_stg_load运算符上遇到错误,说它处于upstream_failed状态。但是,所有上游任务都标记为成功。 DAG本身现在被标记为失败。如果我清除wait_stg_load上的状态,则一切正常。关于我在做什么错的任何想法吗?

我正在使用Google Cloud Composer,它是Python 3上的Airflow v 1.9版本

enter image description here

with DAG('load_data',
    default_args=default_args,
    schedule_interval='0 9 * * *',
    concurrency=3
) as dag:

    t2 = DummyOperator(
        task_id='wait_stg_load',
        dag=dag
    )

    for t in tables:
        t1 = GoogleCloudStorageToBigQueryOperator(
            task_id='load_stg_{}'.format(t.replace('.','_')),
            bucket='my-bucket',
            source_objects=['data/{}.json'.format(t)],
            destination_project_dataset_table='{}.stg_{}'.format(DATASET_NAME, t.replace('.','_')),
            schema_object='data/schemas/{}.json'.format(t),
            source_format='NEWLINE_DELIMITED_JSON',
            write_disposition='WRITE_TRUNCATE',
            dag=dag
        )
        t1 >> t2

Airflow Diagram

更新1

我认为这是Airflow中的并发问题。我注意到该任务确实确实在某个时候失败了,但是以后还是可以运行。它被标记为完成,但是DummyOperator看不到它。

[2019-02-14 09:00:14,734] {cli.py:374} INFO - Running on host airflow-worker
[2019-02-14 09:00:16,686] {models.py:1196} INFO - Dependencies all met for <TaskInstance: dag.task 2019-02-13 09:00:00 [queued]>
[2019-02-14 09:00:16,694] {models.py:1189} INFO - Dependencies not met for <TaskInstance: dag.task 2019-02-13 09:00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (3) for this task's DAG 'dag' has been reached.
[2019-02-14 09:00:16,694] {models.py:1389} WARNING -
-------------------------------------------------------------------------------
FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 1. State set to NONE
-------------------------------------------------------------------------------

[2019-02-14 09:00:16,694] {models.py:1392} INFO - Queuing into pool None
[2019-02-14 09:00:26,619] {cli.py:374} INFO - Running on host airflow-worker
[2019-02-14 09:00:28,563] {models.py:1196} INFO - Dependencies all met for <TaskInstance: dag.task 2019-02-13 09:00:00 [failed]>
[2019-02-14 09:00:28,570] {models.py:1196} INFO - Dependencies all met for <TaskInstance: dag.task 2019-02-13 09:00:00 [failed]>
[2019-02-14 09:00:28,570] {models.py:1406} INFO -
-------------------------------------------------------------------------------
Starting attempt 1 of 
-------------------------------------------------------------------------------

[2019-02-14 09:00:28,607] {models.py:1427} INFO - Executing <Task(GoogleCloudStorageToBigQueryOperator): task> on 2019-02-13 09:00:00

0 个答案:

没有答案