MySqlToGoogleCloudStorageOperator:导出失败

时间:2019-06-17 15:33:01

标签: mysql json airflow

我正在尝试将简单的mysql查询导出到json文件(但也可能是csv),但是在进程日志中,我收到json分隔符错误...我已经更改了sql select,更改了dag参数,没有任何效果。

我正在导出json进行测试,但尝试将其导出为csv并遇到相同的错误。如果可以使用该格式创建架构文件,我会优先选择csv。如果没有,则该解决方案可以基于json格式导出

import airflow
from airflow import DAG
from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperator
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
from datetime import datetime, timedelta
from airflow.operators.bash_operator import BashOperator

default_args = {
    'owner': 'bexs-data',
    #'start_date': airflow.utils.dates.days_ago(2),
    # 'end_date': datetime(2019, 06, 17),
    'depends_on_past': False,
    'email': ['airflow@airflow.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'depends_on_past': False,
    # If a task fails, retry it once after waiting
    # at least 5 minutes
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(
    dag_id='test_airflow_mysql',
    default_args=default_args,
    schedule_interval=timedelta(days=1),
    dagrun_timeout=timedelta(minutes=60)
)

import_orders_op = MySqlToGoogleCloudStorageOperator(
    task_id='import_orders',
    mysql_conn_id='mysql_conn',
    google_cloud_storage_conn_id='my_gcp_conn',
    provide_context=True,
    sql='SELECT * FROM bd.test',
    bucket='big-data-sandbox',
    filename='test_airflow_mysql{}.json',
    schema_filename='sc_test_airflow_mysql.json',
    dag=dag)

记录错误

--------------------------------------------------------------------------------
Starting attempt 1 of 2
--------------------------------------------------------------------------------

[2019-06-17 15:06:25,768] {{models.py:1595}} INFO - Executing <Task(MySqlToGoogleCloudStorageOperator): import_orders> on 2019-06-17T15:05:47.636804+00:00
[2019-06-17 15:06:25,768] {{base_task_runner.py:118}} INFO - Running: ['bash', '-c', 'airflow run test_airflow_mysql7 import_orders 2019-06-17T15:05:47.636804+00:00 --job_id 129 --raw -sd DAGS_FOLDER/config/test_airflow.py --cfg_path /tmp/tmplrn_14js']
[2019-06-17 15:06:26,882] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:26,881] {{settings.py:174}} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
[2019-06-17 15:06:27,474] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:27,473] {{__init__.py:51}} INFO - Using executor LocalExecutor
[2019-06-17 15:06:28,301] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:28,299] {{models.py:271}} INFO - Filling up the DagBag from /usr/local/airflow/dags/config/test_airflow.py
[2019-06-17 15:06:28,571] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:28,571] {{cli.py:484}} INFO - Running <TaskInstance: test_airflow_mysql7.import_orders 2019-06-17T15:05:47.636804+00:00 [running]> on host 427284f0f03e
[2019-06-17 15:06:28,626] {{logging_mixin.py:95}} INFO - [2019-06-17 15:06:28,625] {{base_hook.py:83}} INFO - Using connection to: 10.0.0.11
[2019-06-17 15:06:28,691] {{models.py:1760}} ERROR - Expecting ',' delimiter: line 11 column 133 (char 2337)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 116, in execute
    self._upload_to_gcs(files_to_upload)
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 219, in _upload_to_gcs
    hook.upload(self.bucket, object, tmp_file_handle.name, 'application/json')
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 193, in upload
    service = self.get_conn()
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 49, in get_conn
    http_authorized = self._authorize()
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 131, in _authorize
    credentials = self._get_credentials()
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 93, in _get_credentials
    key_path, scopes=scopes)
  File "/usr/local/lib/python3.6/site-packages/google/oauth2/service_account.py", line 209, in from_service_account_file
    filename, require=['client_email', 'token_uri'])
  File "/usr/local/lib/python3.6/site-packages/google/auth/_service_account_info.py", line 72, in from_filename
    data = json.load(json_file)
  File "/usr/local/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 11 column 133 (char 2337)
[2019-06-17 15:06:28,695] {{models.py:1783}} INFO - Marking task as UP_FOR_RETRY
[2019-06-17 15:06:28,725] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders Traceback (most recent call last):
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/bin/airflow", line 32, in <module>
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     args.func(args)
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     return f(*args, **kwargs)
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 490, in run
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     _run(args, dag, ti)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 406, in _run
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     pool=args.pool,
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     return func(*args, **kwargs)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     result = task_copy.execute(context=context)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 116, in execute
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     self._upload_to_gcs(files_to_upload)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 219, in _upload_to_gcs
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     hook.upload(self.bucket, object, tmp_file_handle.name, 'application/json')
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 193, in upload
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     service = self.get_conn()
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 49, in get_conn
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     http_authorized = self._authorize()
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 131, in _authorize
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     credentials = self._get_credentials()
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 93, in _get_credentials
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     key_path, scopes=scopes)
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/google/oauth2/service_account.py", line 209, in from_service_account_file
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     filename, require=['client_email', 'token_uri'])
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/site-packages/google/auth/_service_account_info.py", line 72, in from_filename
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     data = json.load(json_file)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/json/__init__.py", line 299, in load
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     return _default_decoder.decode(s)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders   File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders     obj, end = self.scan_once(s, idx)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders json.decoder.JSONDecodeError: Expecting ',' delimiter: line 11 column 133 (char 2337)
[2019-06-17 15:06:30,744] {{logging_mixin.py:95}} INFO - [2019-06-17 15:06:30,740] {{jobs.py:2627}} INFO - Task exited with return code 1

0 个答案:

没有答案