使用Airflow从Google Storage导出到BQ时出错

时间:2018-04-20 16:24:36

标签: google-bigquery airflow

我尝试从Google云端存储导出文件并将其加载到BigQuery。 我这样做时收到以下错误:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 285, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/root/airflow/dags/mysql_bi_invoices.py", line 8, in <module>
    from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperatorfrom
ImportError: cannot import name MySqlToGoogleCloudStorageOperatorfrom
Done.
[2018-04-20 15:21:30,773] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-04-20 15:21:30,858] {models.py:189} INFO - Filling up the DagBag from /root/airflow/dags
[2018-04-20 15:21:31,333] {models.py:288} ERROR - Failed to import: /root/airflow/dags/mysql_bi_invoices.py
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 285, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/root/airflow/dags/mysql_bi_invoices.py", line 8, in <module>
    from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperatorfrom
ImportError: cannot import name MySqlToGoogleCloudStorageOperatorfrom
Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 27, in <module>
    args.func(args)
  File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 516, in test
    dag = dag or get_dag(args)
  File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 130, in get_dag
    'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: test. Either the dag did not exist or it failed to parse.

我的DAG看起来像这样:

import airflow
from datetime import timedelta, datetime
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from airflow.contrib.operators.bigquery_to_gcs import BigQueryToCloudStorageOperator
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperatorfrom
from airflow.contrib.hooks.bigquery_hook import BigQueryHook


default_args = {
    'owner': 'test',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(1),
    'email': ['test@test'],
    'email_on_failure': True,
    'email_on_retry': True,
    'retries': 2,
    'retry_delay': timedelta(minutes=5),    }

with DAG('test',
        schedule_interval='04 04 * * *',
        default_args=default_args) as dag:

    load_to_bq = GoogleCloudStorageToBigQueryOperator(
        task_id='test_to_bq',
        bucket='test',
        source_objects = 'gs://test/test_to_bq_folder',
        schema_object = 'test/file_to_extract.json',
        destination_project_dataset_table='test.final_table',
        source_format='JSON',
        create_disposition='CREATE_IF_NEEDED',
        write_disposition='WRITE_TRUNCATE',
        google_cloud_storage_conn_id='google_cloud',
        bigquery_conn_id='google_cloud',
        dag = dag
    )

我尝试添加/更改DAG的参数,但还没有成功。任何见解都会有所帮助

1 个答案:

答案 0 :(得分:2)

此错误与GBQ无关,请参阅错误消息:

airflow.exceptions.AirflowException: dag_id could not be found: test. Either the dag did not exist or it failed to parse.

首先检查是否可以使用

列出DAG
airflow list_dags

如果这不起作用,则DAG中有错误。此外,输出中已存在错误原因:

ImportError: cannot import name MySqlToGoogleCloudStorageOperatorfrom

似乎是一个错字,应该是

MySqlToGoogleCloudStorageOperator

导入。