将嵌套数组加载到bigquery中

时间:2016-08-21 13:08:40

标签: arrays json google-bigquery

我有JSON格式的数据,其中包含嵌套数组。这是一个例子:

"data": {"events": [[1, 1271, 518, 945], [1, 1287, 495, 963],...

子阵列的长度可以是3或4,第一个数字是数据类型(大约有30个不同的数据)。有没有办法将这些数据加载到bigQuery而不转换成字典'记录'?

谢谢, 亚龙

- 编辑 -

this的问题,有一个解决方法,但是有一个固定长度的子阵列,所以不适用我猜..

2 个答案:

答案 0 :(得分:1)

无法直接加载数组数组;您需要使用记录来包装数组的内部级别。标准SQL的引用就此而言(尽管就语言本身而言,不是加载数据):https://cloud.google.com/bigquery/sql-reference/arrays#building-arrays-of-arrays

答案 1 :(得分:1)

这可能是错误的方向,因为不完全清楚你的最终目标是什么,但让我试着帮助你 不知何故,我觉得你的目的地表应该是下面的

theTable

所以,我的建议是分两步完成它

第1步 - 只需一个字段即可将您的数据加载为CSV - 让表示data字段为 data {"data": {"events": [[1, 1271, 518, 945], [1, 1287, 495, 963]]}} {"data": {"events": [[2, 111, 222, 333], [3, 444, 555, 666], [4, 777, 888, 999]]}}

theTable

第2步 - 处理SELECT NTH(1, SPLIT(y)) AS type, NTH(2, SPLIT(y)) AS metric1, NTH(3, SPLIT(y)) AS metric2, NTH(4, SPLIT(y)) AS metric3, FROM ( SELECT REPLACE(REPLACE(COALESCE(y0, y1, y2, y3, y4, y5, y6), '[', ''), ']', '') AS y FROM ( SELECT IF(k=0, JSON_EXTRACT(data, '$.data.events[0]'), NULL) AS y0, IF(k=1, JSON_EXTRACT(data, '$.data.events[1]'), NULL) AS y1, IF(k=2, JSON_EXTRACT(data, '$.data.events[2]'), NULL) AS y2, IF(k=3, JSON_EXTRACT(data, '$.data.events[3]'), NULL) AS y3, IF(k=4, JSON_EXTRACT(data, '$.data.events[4]'), NULL) AS y4, IF(k=5, JSON_EXTRACT(data, '$.data.events[5]'), NULL) AS y5, IF(k=6, JSON_EXTRACT(data, '$.data.events[6]'), NULL) AS y6, FROM theTable AS a CROSS JOIN ( SELECT k FROM (SELECT 0 AS k), (SELECT 1 AS k), (SELECT 2 AS k), (SELECT 3 AS k), (SELECT 4 AS k), (SELECT 5 AS k), (SELECT 6 AS k) ) AS b ) HAVING NOT y IS NULL ) 以生成预期的架构(请参阅答案顶部)并保存到最终表格中。您可以使用以下查询

type    metric1 metric2 metric3  
   1       1271     518     945  
   1       1287     495     963  
   2        111     222     333  
   3        444     555     666  
   4        777     888     999  

结果将是

REPLACE(REPLACE(COALESCE(y0, y1, y2, y3, y4, y5, y6), '[', ''), ']', '') AS y    

正如您所看到的 - 此特定查询最多支持7个子数组,但您可以通过在三个位置更改代码来减少或增加此数据

#1

IF(k=0, JSON_EXTRACT(data, '$.data.events[0]'), NULL) AS y0,
IF(k=1, JSON_EXTRACT(data, '$.data.events[1]'), NULL) AS y1,
IF(k=2, JSON_EXTRACT(data, '$.data.events[2]'), NULL) AS y2,
IF(k=3, JSON_EXTRACT(data, '$.data.events[3]'), NULL) AS y3,
IF(k=4, JSON_EXTRACT(data, '$.data.events[4]'), NULL) AS y4,
IF(k=5, JSON_EXTRACT(data, '$.data.events[5]'), NULL) AS y5,
IF(k=6, JSON_EXTRACT(data, '$.data.events[6]'), NULL) AS y6,

#2

SELECT k FROM (SELECT 0 AS k), (SELECT 1 AS k), (SELECT 2 AS k), 
(SELECT 3 AS k), (SELECT 4 AS k), (SELECT 5 AS k), (SELECT 6 AS k)

#3

SELECT
  NTH(1, SPLIT(y)) AS type,
  NTH(2, SPLIT(y)) AS metric1,
  NTH(3, SPLIT(y)) AS metric2,
  NTH(4, SPLIT(y)) AS metric3,
FROM (
  SELECT 
    REPLACE(REPLACE(COALESCE(y0, y1, y2, y3, y4, y5, y6), '[', ''), ']', '') AS y  
  FROM (
    SELECT
      IF(k=0, JSON_EXTRACT(data, '$.data.events[0]'), NULL) AS y0,
      IF(k=1, JSON_EXTRACT(data, '$.data.events[1]'), NULL) AS y1,
      IF(k=2, JSON_EXTRACT(data, '$.data.events[2]'), NULL) AS y2,
      IF(k=3, JSON_EXTRACT(data, '$.data.events[3]'), NULL) AS y3,
      IF(k=4, JSON_EXTRACT(data, '$.data.events[4]'), NULL) AS y4,
      IF(k=5, JSON_EXTRACT(data, '$.data.events[5]'), NULL) AS y5,
      IF(k=6, JSON_EXTRACT(data, '$.data.events[6]'), NULL) AS y6,
    FROM (
    SELECT data FROM 
      (SELECT '{"data": {"events": [[1, 1271, 518, 945], [1, 1287, 495, 963]]}}' AS data),
      (SELECT '{"data": {"events": [[2, 111, 222, 333], [3, 444, 555, 666], [4, 777, 888, 999]]}}' AS data) 
    ) AS a
    CROSS JOIN (
      SELECT k FROM (SELECT 0 AS k), (SELECT 1 AS k), (SELECT 2 AS k), 
      (SELECT 3 AS k), (SELECT 4 AS k), (SELECT 5 AS k), (SELECT 6 AS k)
    ) AS b
  )
  HAVING NOT y IS NULL
)

最后,要测试转换逻辑,不加载实际数据 - 您可以使用下面的脚本

l

希望这有用!