在BigQuery中嵌套多个重复的字段

时间:2016-03-03 19:28:53

标签: google-bigquery

通过导入JSON文件

在GBQ中加载重复的字段

通过在BigQuery中导入包含重复记录的JSON文件,您可以创建一个包含嵌套重复字段的表。

例如,对于架构:

[
{"type":"STRING", "name":"item"},
{"type":"RECORD", "name":"click", "mode":"REPEATED", "fields": [{"type":"TIMESTAMP", "name":"click_time"}, {"type":"STRING", "name":"userid"}]
}
]

您可以加载项目点击的JSON文件,并为每个项目重复点击。该表格包含字段itemclick.click_timeclick.userid

我的问题

假设您有一个CSV文件,该文件已经展平了上述JSON项目点击次数,每次点击一行,但重复了clickitem的值。您是否可以将其加载到GBQ中并使用GBQ查询将其转换为您在重复字段中加载JSON文件时所具有的等效表?

导入的CSV表格上的GBQ查询生成的表格应包含click.click_timeclick.userid项作为字段。

2 个答案:

答案 0 :(得分:3)

假设您在表格中展平了数据:

item    click_time  userid   
a1  2016-03-03 19:52:23 UTC u1   
a1  2016-03-03 19:52:23 UTC u2   
a1  2016-03-03 19:52:23 UTC u3   
a1  2016-03-03 19:52:23 UTC u4   
a2  2016-03-03 19:52:23 UTC u1   
a2  2016-03-03 19:52:23 UTC u2

以下GBQ查询执行您要求的内容:
请注意:您需要写入“允许大结果”的表格。并且' UnFlatten'选项

SELECT *
FROM JS( 
  ( // input table 
    SELECT item, NEST(CONCAT(STRING(click_time), ',', STRING(userid))) AS clicks 
    FROM YourTable
    GROUP BY item
  ), 
  item, clicks, // input columns 
  "[ // output schema 
    {'name': 'item', 'type': 'STRING'},
     {'name': 'clicks', 'type': 'RECORD',
     'mode': 'REPEATED',
     'fields': [
       {'name': 'click_time', 'type': 'STRING'},
       {'name': 'userid', 'type': 'STRING'}
       ]    
     }
  ]", 
  "function(row, emit) { // function 
    var c = []; 
    for (var i = 0; i < row.clicks.length; i++) { 
      x = row.clicks[i].split(','); 
      t = {click_time:x[0], 
            userid:x[1]} ;
      c.push(t); 
    }; 
    emit({item: row.item, clicks: c}); 
  }"
) 

结果预计如下

enter image description here

答案 1 :(得分:3)

通过引入BigQuery Standard SQL,我们可以轻松地处理记录 请尝试以下操作,不要忘记取消选中显示选项

下的Use Legacy SQL复选框
WITH YourTable AS (
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u1' AS userid UNION ALL
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u2' AS userid UNION ALL
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u3' AS userid UNION ALL
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u4' AS userid UNION ALL
  SELECT 'a2' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u1' AS userid UNION ALL
  SELECT 'a2' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u2' AS userid
)
SELECT item, ARRAY_AGG(STRUCT(click_time, userid)) AS clicks
FROM YourTable
GROUP BY item