DAG成功运行,但在Airflow Web服务器UI中DAG不可用/ DAG在Google Cloud Composer中不可单击

时间:2019-03-28 10:15:38

标签: python airflow google-cloud-composer

下面是气流DAG代码。无论是在本地托管气流还是在Cloud Composer上,它都能完美运行。但是,DAG本身在Composer UI中不可单击。 我发现了类似的问题,并尝试了this question中链接的可接受答案。我的问题是相似的。

import airflow
from airflow import DAG

from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.mysql_operator import MySqlOperator
from airflow.contrib.operators.dataproc_operator import DataprocClusterCreateOperator
from airflow.contrib.operators.dataproc_operator import DataprocClusterDeleteOperator
from airflow.contrib.operators.dataproc_operator import DataProcSparkOperator

from datetime import datetime, timedelta
import sys

#copy this package to dag directory in GCP composer bucket
from schemas.schemaValidator import loadSchema
from schemas.schemaValidator import sparkArgListToMap

#change these paths to point to GCP Composer data directory

## cluster config
clusterConfig= loadSchema("somePath/jobConfig/cluster.yaml","cluster")

##per job yaml config
autoLoanCsvToParquetConfig= loadSchema("somePath/jobConfig/job.yaml","job")

default_args= {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2019, 1, 1),
        'retries': 1,
        'retry_delay': timedelta(minutes=3)
}

dag= DAG('usr_job', default_args=default_args, schedule_interval=None)

t1= DummyOperator(task_id= "start", dag=dag)

t2= DataprocClusterCreateOperator(
    task_id= "CreateCluster",
    cluster_name= clusterConfig["cluster"]["cluster_name"],
    project_id= clusterConfig["project_id"],
    num_workers= clusterConfig["cluster"]["worker_config"]["num_instances"],
    image_version= clusterConfig["cluster"]["dataproc_img"],
    master_machine_type= clusterConfig["cluster"]["worker_config"]["machine_type"],
    worker_machine_type= clusterConfig["cluster"]["worker_config"]["machine_type"],
    zone= clusterConfig["region"],
    dag=dag
)

t3= DataProcSparkOperator(
    task_id= "csvToParquet",
    main_class= autoLoanCsvToParquetConfig["job"]["main_class"],
    arguments= autoLoanCsvToParquetConfig["job"]["args"],
    cluster_name= clusterConfig["cluster"]["cluster_name"],
    dataproc_spark_jars= autoLoanCsvToParquetConfig["job"]["jarPath"],
    dataproc_spark_properties= sparkArgListToMap(autoLoanCsvToParquetConfig["spark_params"]),
    dag=dag
)

t4= DataprocClusterDeleteOperator(
    task_id= "deleteCluster",
    cluster_name= clusterConfig["cluster"]["cluster_name"],
    project_id= clusterConfig["project_id"],
    dag= dag
)

t5= DummyOperator(task_id= "stop", dag=dag)

t1>>t2>>t3>>t4>>t5

UI给出此错误-"This DAG isn't available in the webserver DAG bag object. It shows up in this list because the scheduler marked it as active in the metadata database.

但是,当我在Composer上手动触发DAG时,我发现它在日志文件中成功运行。

1 个答案:

答案 0 :(得分:0)

问题在于path,该文件用于拾取配置文件。我正在提供GCS中data文件夹的路径。根据Google文档,只有dags文件夹同步到所有节点,而不同步data文件夹。

不用说,这是在dag解析期间遇到的一个问题,因此,它没有正确显示在UI上。更有趣的是,这些调试消息没有暴露给Composer 1.5及更早版本。现在,最终用户可以使用它们来帮助调试。无论如何,感谢所有帮助的人。