Question

我创建了一个Cloud Functions，用于将数据发送到BigQuery Cloud Functions正在从pub / sub接收数据。

场景1：我写了一条python代码直接将JSON数据发送到Bigquery，没问题

方案2：我将JSON数据保存到.json文件，并使用bq load命令手动上传到Bigquery，没问题

方案3 ：（出现错误的地方） Cloud Functions可以从Pub / Sub接收数据，但不能将其发送到BigQuery。

这是Cloud Functions的代码：

from google.cloud import bigquery
import base64, json, sys, os

def pubsub_to_bq(event, context):
   if 'data' in event:
      print("Event Data is found : " + str(event['data']))
      name = base64.b64decode(event['data']).decode('utf-8')
   else:
      name = 'World'
   print('Hello {}!'.format(name))


   pubsub_message = base64.b64decode(event['data']).decode('utf-8')
   print(pubsub_message)
   to_bigquery(os.environ['dataset'], os.environ['table'], json.loads(str(pubsub_message)))

def to_bigquery(dataset, table, document):
   bigquery_client = bigquery.Client()
   table = bigquery_client.dataset(dataset).table(table)
   
   job_config.source_format = bq.SourceFormat.NEWLINE_DELIMITED_JSON
   job_config = bq.LoadJobConfig()
   job_config.autodetect = True
   
   errors = bigquery_client.insert_rows_json(table,json_rows=[document],job_config=job_config)
   if errors != [] :
      print(errors, file=sys.stderr)

我尝试了两种类型的JSON数据格式，但是都没有运气。 [{“ field1”：“ data1”，“ field2”：“ data2”}]或 {“ field1”：“ data1”，“ field2”：“ data2”}

我从Cloud Functions事件日志中可以获得的所有错误消息是： textPayload：“函数执行耗时100毫秒，状态为：崩溃”

有什么专家可以帮助我吗？谢谢。

Answer 1

如果您有look to the library code，那么insert_rows_json就是这个

    def insert_rows_json(
        self,
        table,
        json_rows,
        row_ids=None,
        skip_invalid_rows=None,
        ignore_unknown_values=None,
        template_suffix=None,
        retry=DEFAULT_RETRY,
        timeout=None,
    ):

没有job_config参数！崩溃应该来自这个错误

方法insert_rows_json performs a streaming insert，而不是加载作业。

对于从JSON进行加载的工作，可以使用load_table_from_json方法，该方法也可以在库的源代码中找到。代码库为similar to this (for the JobConfig option)

    def load_table_from_json(
        self,
        json_rows,
        destination,
        num_retries=_DEFAULT_NUM_RETRIES,
        job_id=None,
        job_id_prefix=None,
        location=None,
        project=None,
        job_config=None,
    ):

Google GCP云功能可解决BigQuery错误

1 个答案: