当尝试使用GoogleCloudStorageToBigQueryOperator从云编写器运行DAG时出现错误。
最终错误是:{'原因':'无效','位置':'gs://xxxxxx/xxxx.csv', 而当我按照指向错误的URL链接...
{
"error": {
"code": 401,
"message": "Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole- project.",
"errors": [
{
"message": "Login Required.",
"domain": "global",
"reason": "required",
"location": "Authorization",
"locationType": "header"
}
],
"status": "UNAUTHENTICATED"
}
}
我已经配置了Cloud Storage连接...
Conn ID My_Cloud_Storage
Conn类型Google Cloud Platform
项目ID xxxxxx
密钥文件路径/home/airflow/gcs/data/xxx.json
密钥文件JSON
范围(逗号分隔)https://www.googleapis.com/auth/cloud-platform
代码..
from __future__ import print_function
import datetime
from airflow import models
from airflow import DAG
from airflow.operators import bash_operator
from airflow.operators import python_operator
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
default_dag_args = {
# The start_date describes when a DAG is valid / can be run. Set this to a
# fixed point in time rather than dynamically, since it is evaluated every
# time a DAG is parsed. See:
# https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date
'start_date': datetime.datetime(2019, 4, 15),
}
with models.DAG(
'Ian_gcs_to_BQ_Test',
schedule_interval=datetime.timedelta(days=1),
default_args=default_dag_args) as dag:
load_csv = GoogleCloudStorageToBigQueryOperator(
task_id='gcs_to_bq_test',
bucket='xxxxx',
source_objects=['xxxx.csv'],
destination_project_dataset_table='xxxx.xxxx.xxxx',
google_cloud_storage_conn_id='My_Cloud_Storage',
schema_fields=[
{'name':'AAAA','type':'INTEGER','mode':'NULLABLE'},
{'name':'BBB_NUMBER','type':'INTEGER','mode':'NULLABLE'},
],
write_disposition='WRITE_TRUNCATE',
dag=dag)
答案 0 :(得分:1)
好,现在已修复。 事实证明,由于文件中的标题行,它无法正常工作,一旦我删除它,它就可以正常工作。 关于无效位置和授权的非常烦人的,完全令人误解的错误消息。