Bigquery根据id和id_type聚合成数组

时间:2019-01-25 04:16:50

标签: google-bigquery standard-sql

我有一个与此表类似的表:

batch_size=1

enter image description here

现在,我尝试为每个对象创建三个新列batch_size=1WITH table AS ( SELECT 1 object_id, 234 type_id, 2 type_level UNION ALL SELECT 1, 23, 1 UNION ALL SELECT 1, 24, 1 UNION ALL SELECT 1, 2, 0 UNION ALL SELECT 1, 2, 0 UNION ALL SELECT 2, 34, 1 UNION ALL SELECT 2, 46, 1 UNION ALL SELECT 2, 465, 2 UNION ALL SELECT 2, 349, 2 UNION ALL SELECT 2, 4, 0 UNION ALL SELECT 2, 3, 0 ) SELECT object_id, type_id, type_level FROM table type_level_0_array并将相应级别的类型的type_id聚合到那些数组中(我不是在寻找用逗号分隔的字符串)。

因此,我的结果表应如下所示:

type_level_1_array

有什么方法可以做到吗?

更新:

尽管我的type_id似乎具有某种模式,例如0级类型的长度为1,1级类型的长度为2,依此类推,在我的实际数据集中没有这种模式。仅通过查看任何行的type_level_2_array即可确定级别。

2 个答案:

答案 0 :(得分:2)

以下是用于BigQuery标准SQL

https://login.microsoftonline.com/tenantid/oauth2/v2.0/token

• Request Headers:

• Content-Type:"application/x-www-form-urlencoded"

• Postman-Token:"27a57c92-a5aa-47b7-8121-01ceb18d1d50"

• User-Agent:"PostmanRuntime/7.6.0"

• Host:"login.microsoftonline.com"


• Request Body:

• client_id:"***********************"

• client_secret:"*********************"

• scope:"https://outlook.office365.com/EWS.AccessAsUser.All"

• grant_type:"client_credentials"

您可以使用以下问题中的示例数据来测试,玩转

#standardSQL
SELECT object_id,
  ARRAY_AGG(DISTINCT IF(type_level = 0, type_id, NULL) IGNORE NULLS) AS type_level_0_array,
  ARRAY_AGG(DISTINCT IF(type_level = 1, type_id, NULL) IGNORE NULLS) AS type_level_1_array,
  ARRAY_AGG(DISTINCT IF(type_level = 2, type_id, NULL) IGNORE NULLS) AS type_level_2_array
FROM `project.dataset.table`
GROUP BY object_id    

有结果

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 object_id, 234 type_id, 2 type_level UNION ALL
  SELECT 1, 23, 1 UNION ALL
  SELECT 1, 24, 1 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 2, 34, 1 UNION ALL
  SELECT 2, 46, 1 UNION ALL
  SELECT 2, 465, 2 UNION ALL
  SELECT 2, 349, 2 UNION ALL
  SELECT 2, 4, 0 UNION ALL
  SELECT 2, 3, 0 )
SELECT object_id,
  ARRAY_AGG(DISTINCT IF(type_level = 0, type_id, NULL) IGNORE NULLS) AS type_level_0_array,
  ARRAY_AGG(DISTINCT IF(type_level = 1, type_id, NULL) IGNORE NULLS) AS type_level_1_array,
  ARRAY_AGG(DISTINCT IF(type_level = 2, type_id, NULL) IGNORE NULLS) AS type_level_2_array
FROM `project.dataset.table`
GROUP BY object_id   

答案 1 :(得分:0)

尝试一下。为我工作。

Bigquery不允许您创建其中包含Null的数组,这就是为什么需要IGNORE NULLS的原因。

编辑:我已经将代码更新为基于type_level列

WITH table
 AS (
  SELECT 1 object_id, 234 type_id, 2 type_level UNION ALL
  SELECT 1, 23, 1 UNION ALL
  SELECT 1, 24, 1 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 1, 2, 0 UNION ALL
  SELECT 2, 34, 1 UNION ALL
  SELECT 2, 46, 1 UNION ALL
  SELECT 2, 465, 2 UNION ALL
  SELECT 2, 349, 2 UNION ALL
  SELECT 2, 4, 0 UNION ALL
  SELECT 2, 3, 0 )
SELECT
  ARRAY_AGG(CASE WHEN type_level = 0 THEN type_id ELSE NULL END IGNORE NULLS) AS type_level_0_array
  , ARRAY_AGG(CASE WHEN type_level = 1 THEN type_id ELSE NULL END IGNORE NULLS) AS type_level_1_array
  , ARRAY_AGG(CASE WHEN type_level = 2 THEN type_id ELSE NULL END IGNORE NULLS) AS type_level_2_array
FROM
  table