我正在尝试将简单的mysql查询导出到json文件(但也可能是csv),但是在进程日志中,我收到json分隔符错误...我已经更改了sql select,更改了dag参数,没有任何效果。
我正在导出json进行测试,但尝试将其导出为csv并遇到相同的错误。如果可以使用该格式创建架构文件,我会优先选择csv。如果没有,则该解决方案可以基于json格式导出
import airflow
from airflow import DAG
from airflow.contrib.operators.mysql_to_gcs import MySqlToGoogleCloudStorageOperator
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
from datetime import datetime, timedelta
from airflow.operators.bash_operator import BashOperator
default_args = {
'owner': 'bexs-data',
#'start_date': airflow.utils.dates.days_ago(2),
# 'end_date': datetime(2019, 06, 17),
'depends_on_past': False,
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'depends_on_past': False,
# If a task fails, retry it once after waiting
# at least 5 minutes
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
dag_id='test_airflow_mysql',
default_args=default_args,
schedule_interval=timedelta(days=1),
dagrun_timeout=timedelta(minutes=60)
)
import_orders_op = MySqlToGoogleCloudStorageOperator(
task_id='import_orders',
mysql_conn_id='mysql_conn',
google_cloud_storage_conn_id='my_gcp_conn',
provide_context=True,
sql='SELECT * FROM bd.test',
bucket='big-data-sandbox',
filename='test_airflow_mysql{}.json',
schema_filename='sc_test_airflow_mysql.json',
dag=dag)
记录错误
--------------------------------------------------------------------------------
Starting attempt 1 of 2
--------------------------------------------------------------------------------
[2019-06-17 15:06:25,768] {{models.py:1595}} INFO - Executing <Task(MySqlToGoogleCloudStorageOperator): import_orders> on 2019-06-17T15:05:47.636804+00:00
[2019-06-17 15:06:25,768] {{base_task_runner.py:118}} INFO - Running: ['bash', '-c', 'airflow run test_airflow_mysql7 import_orders 2019-06-17T15:05:47.636804+00:00 --job_id 129 --raw -sd DAGS_FOLDER/config/test_airflow.py --cfg_path /tmp/tmplrn_14js']
[2019-06-17 15:06:26,882] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:26,881] {{settings.py:174}} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
[2019-06-17 15:06:27,474] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:27,473] {{__init__.py:51}} INFO - Using executor LocalExecutor
[2019-06-17 15:06:28,301] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:28,299] {{models.py:271}} INFO - Filling up the DagBag from /usr/local/airflow/dags/config/test_airflow.py
[2019-06-17 15:06:28,571] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders [2019-06-17 15:06:28,571] {{cli.py:484}} INFO - Running <TaskInstance: test_airflow_mysql7.import_orders 2019-06-17T15:05:47.636804+00:00 [running]> on host 427284f0f03e
[2019-06-17 15:06:28,626] {{logging_mixin.py:95}} INFO - [2019-06-17 15:06:28,625] {{base_hook.py:83}} INFO - Using connection to: 10.0.0.11
[2019-06-17 15:06:28,691] {{models.py:1760}} ERROR - Expecting ',' delimiter: line 11 column 133 (char 2337)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 116, in execute
self._upload_to_gcs(files_to_upload)
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 219, in _upload_to_gcs
hook.upload(self.bucket, object, tmp_file_handle.name, 'application/json')
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 193, in upload
service = self.get_conn()
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 49, in get_conn
http_authorized = self._authorize()
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 131, in _authorize
credentials = self._get_credentials()
File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 93, in _get_credentials
key_path, scopes=scopes)
File "/usr/local/lib/python3.6/site-packages/google/oauth2/service_account.py", line 209, in from_service_account_file
filename, require=['client_email', 'token_uri'])
File "/usr/local/lib/python3.6/site-packages/google/auth/_service_account_info.py", line 72, in from_filename
data = json.load(json_file)
File "/usr/local/lib/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 11 column 133 (char 2337)
[2019-06-17 15:06:28,695] {{models.py:1783}} INFO - Marking task as UP_FOR_RETRY
[2019-06-17 15:06:28,725] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders Traceback (most recent call last):
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/bin/airflow", line 32, in <module>
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders args.func(args)
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders return f(*args, **kwargs)
[2019-06-17 15:06:28,726] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 490, in run
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders _run(args, dag, ti)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 406, in _run
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders pool=args.pool,
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders return func(*args, **kwargs)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders result = task_copy.execute(context=context)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 116, in execute
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders self._upload_to_gcs(files_to_upload)
[2019-06-17 15:06:28,727] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 219, in _upload_to_gcs
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders hook.upload(self.bucket, object, tmp_file_handle.name, 'application/json')
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 193, in upload
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders service = self.get_conn()
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcs_hook.py", line 49, in get_conn
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders http_authorized = self._authorize()
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 131, in _authorize
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders credentials = self._get_credentials()
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/gcp_api_base_hook.py", line 93, in _get_credentials
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders key_path, scopes=scopes)
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/google/oauth2/service_account.py", line 209, in from_service_account_file
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders filename, require=['client_email', 'token_uri'])
[2019-06-17 15:06:28,728] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/site-packages/google/auth/_service_account_info.py", line 72, in from_filename
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders data = json.load(json_file)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/json/__init__.py", line 299, in load
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders return _default_decoder.decode(s)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders obj, end = self.scan_once(s, idx)
[2019-06-17 15:06:28,729] {{base_task_runner.py:101}} INFO - Job 129: Subtask import_orders json.decoder.JSONDecodeError: Expecting ',' delimiter: line 11 column 133 (char 2337)
[2019-06-17 15:06:30,744] {{logging_mixin.py:95}} INFO - [2019-06-17 15:06:30,740] {{jobs.py:2627}} INFO - Task exited with return code 1