Question

我有一个宽大的平台，以类似的格式存储在Google bigquery中：

LOG_DATE：整数，SessionID的：整数，计算机：串，IP：串，event_id的：整数，量：浮

我正在尝试以分层嵌套格式创建此表，具有2个嵌套级别，如下所示：

 [
  {
    "name": "log_date",
    "type": "integer"
  }, 
  {
    "name": "session",
    "type": "record",
    "mode": "repeated",
    "fields": [                 
     {
       "name": "sessionid",
       "type": "integer"
         },
     {
       "name": "computer",
       "type": "string"
        },
        {
       "name": "ip",
       "type": "string"
        },
        {
    "name": "event",
    "type": "record",
    "mode": "repeated",
    "fields": [
    {
       "name": "event_id",
       "type": "integer"
     },
     {
       "name": "amount",
       "type": "float"
     }]] } ]

从bigquery表生成json格式的数据文件的最佳方法是什么？是否有一种不同的，更快的方法 1.将表下载到外部csv 2.构建json记录，并将其写入外部文件 3.将外部json文件上传到新的bigquery表

我们可以有一个从现有表生成json的直接进程吗？

谢谢，H

Answer 1

目前没有办法将数据自动转换为嵌套格式。如果您希望以json格式而不是CSV格式获取数据，则可以使用导出命令，并将--destination_format标志设置为NEWLINE_DELIMITED_JSON。例如

bq extract \
    --destination_format=NEWLINE_DELIMITED_JSON \
    yourdataset.table \
    gs://your_bucket/result*.json

Answer 2

这可以通过标准SQL中的array_agg完成。

请注意，如果要嵌套在图层中，则需要使用公用表表达式，因为import cv2 cam = cv2.VideoCapture(0) # get image from web camera ret, frame = cam.read() # convert to jpeg and save in variable image_bytes = cv2.imencode('.jpg', frame)[1].tobytes()不能直接包含另一个array_agg。

array_agg

如何在BigQuery中从平面表创建嵌套的JSON格式表？

2 个答案: