气流BigQueryOperator标签问题:标签值包含无效字符

时间:2020-06-01 08:50:25

标签: python google-bigquery jinja2 airflow airflow-operator

我一直在尝试利用Airflow macros reference default variables和BigQuery标签来记录通过Airflow BigQueryOperator提交的查询的元数据。这是运算符的定义:

BigQuery_Labels_Test_Task = BigQueryOperator(
    task_id="BigQuery_Labels_Test_Task",
    bql="SELECT 1",
    use_legacy_sql=False,
    bigquery_conn_id="gcp_bq_connection",
    destination_dataset_table=f"test_dataset.test_table",
    create_disposition="CREATE_IF_NEEDED",
    write_disposition="WRITE_TRUNCATE",
    labels={
        "dag_id": "{{ dag.dag_id }}",
        "task_id": "{{ task.task_id }}",
        "run_id": "{{ run_id }}",
    },
    dag=dag,
)

但是在执行时会引发以下错误:

[2020-06-01 08:15:13,495] {{taskinstance.py:887}} INFO - Executing <Task(BigQueryOperator): BigQuery_Labels_Test_Task> on 2020-06-01T07:51:40.752935+00:00
[2020-06-01 08:15:13,499] {{standard_task_runner.py:53}} INFO - Started process 16317 to run task
[2020-06-01 08:15:13,567] {{logging_mixin.py:112}} INFO - Running %s on host %s <TaskInstance: BigQuery_Labels_Test_DAG.BigQuery_Labels_Test_Task 2020-06-01T07:51:40.752935+00:00 [running]> b438a71d2d52
[2020-06-01 08:15:13,592] {{bigquery_operator.py:255}} INFO - Executing: SELECT 1
[2020-06-01 08:15:14,161] {{taskinstance.py:1128}} ERROR - <HttpError 400 when requesting https://bigquery.googleapis.com/bigquery/v2/projects/test-project/jobs?alt=json returned "Label value "manual__2020-06-01T07:51:40.752935+00:00" has invalid characters.">
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 966, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/operators/bigquery_operator.py", line 282, in execute
    encryption_configuration=self.encryption_configuration
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 910, in run_query
    return self.run_with_configuration(configuration)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 1318, in run_with_configuration
    .execute(num_retries=self.num_retries)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/googleapiclient/http.py", line 907, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://bigquery.googleapis.com/bigquery/v2/projects/test-project/jobs?alt=json returned "Label value "manual__2020-06-01T07:51:40.752935+00:00" has invalid characters.">
[2020-06-01 08:15:14,165] {{taskinstance.py:1151}} INFO - Marking task as UP_FOR_RETRY

有人遇到这样的事情吗? bq中的label字段是否有任何字符限制?

PS:当我对标签值进行如下硬编码时,它可以工作。

labels={
    "dag_id": "dag_id",
    "task_id": "task_id",
},

只有小写的值也适用于硬编码。

1 个答案:

答案 0 :(得分:2)

此问题与气流无关,但与BigQuery有关。 对于定义标签,run_id不能满足以下要求(run_id具有+ ,:):

键和值只能包含小写字母,数字 字符,下划线和破折号。所有字符必须使用UTF-8 编码,并且允许使用国际字符。

有关BigQuery标签的详细信息,请参阅this