我尝试从Google云端存储导出文件并将其加载到BigQuery。 我这样做时收到以下错误:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 285, in process_file
m = imp.load_source(mod_name, filepath)
File "/root/airflow/dags/mysql_bi_invoices.py", line 8, in <module>
from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperatorfrom
ImportError: cannot import name MySqlToGoogleCloudStorageOperatorfrom
Done.
[2018-04-20 15:21:30,773] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-04-20 15:21:30,858] {models.py:189} INFO - Filling up the DagBag from /root/airflow/dags
[2018-04-20 15:21:31,333] {models.py:288} ERROR - Failed to import: /root/airflow/dags/mysql_bi_invoices.py
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 285, in process_file
m = imp.load_source(mod_name, filepath)
File "/root/airflow/dags/mysql_bi_invoices.py", line 8, in <module>
from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperatorfrom
ImportError: cannot import name MySqlToGoogleCloudStorageOperatorfrom
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 27, in <module>
args.func(args)
File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 516, in test
dag = dag or get_dag(args)
File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 130, in get_dag
'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: test. Either the dag did not exist or it failed to parse.
我的DAG看起来像这样:
import airflow
from datetime import timedelta, datetime
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from airflow.contrib.operators.bigquery_to_gcs import BigQueryToCloudStorageOperator
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperatorfrom
from airflow.contrib.hooks.bigquery_hook import BigQueryHook
default_args = {
'owner': 'test',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(1),
'email': ['test@test'],
'email_on_failure': True,
'email_on_retry': True,
'retries': 2,
'retry_delay': timedelta(minutes=5), }
with DAG('test',
schedule_interval='04 04 * * *',
default_args=default_args) as dag:
load_to_bq = GoogleCloudStorageToBigQueryOperator(
task_id='test_to_bq',
bucket='test',
source_objects = 'gs://test/test_to_bq_folder',
schema_object = 'test/file_to_extract.json',
destination_project_dataset_table='test.final_table',
source_format='JSON',
create_disposition='CREATE_IF_NEEDED',
write_disposition='WRITE_TRUNCATE',
google_cloud_storage_conn_id='google_cloud',
bigquery_conn_id='google_cloud',
dag = dag
)
我尝试添加/更改DAG的参数,但还没有成功。任何见解都会有所帮助
答案 0 :(得分:2)
此错误与GBQ无关,请参阅错误消息:
airflow.exceptions.AirflowException: dag_id could not be found: test. Either the dag did not exist or it failed to parse.
首先检查是否可以使用
列出DAGairflow list_dags
如果这不起作用,则DAG中有错误。此外,输出中已存在错误原因:
ImportError: cannot import name MySqlToGoogleCloudStorageOperatorfrom
似乎是一个错字,应该是
MySqlToGoogleCloudStorageOperator
导入。