BigQuery中的平面有序值

时间:2017-08-07 16:06:08

标签: sql google-bigquery

我可以在一个简单的查询中从这样的表中获取最后一个NOT NULL值吗?

ID | Name  | Inserted_at | Custom.Value1 | Custom.Value2
1  | Allan | 2017-08-01  | NULL          | NULL
1  | NULL  | 2017-08-03  | Value1        | NULL
1  | NULL  | 2017-08-05  | Value2        | Value3
2  | Jones | 2017-08-02  | NULL          | NULL

我期望返回的价值如下:

1  | Allan | 2017-08-05  | Value2        | Value3
2  | Jones | 2017-08-02  | NULL          | NULL

我知道BigQuery上的更新几乎是不可能的,并且天真的MAX / GROUP / ORDER BY似乎不正确。

任何人都知道如何解决这个问题?

谢谢!

2 个答案:

答案 0 :(得分:4)

以下是一个例子:

#standardSQL
SELECT
  ID,
  ARRAY_AGG(Name IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Name,
  ARRAY_AGG(Custom.Value1 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value1,
  ARRAY_AGG(Custom.Value2 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value2
FROM YourTable
GROUP BY ID;

您可以尝试使用示例数据:

#standardSQL
WITH YourTable AS (
  SELECT 1 AS ID, 'Allan' AS Name, DATE '2017-08-01' AS Inserted_at, STRUCT(CAST(NULL AS STRING) AS Value1, CAST(NULL AS STRING) AS Value2) AS Custom UNION ALL
  SELECT 1, NULL, DATE '2017-08-03', STRUCT('Value1' AS Value1, NULL AS Value2) UNION ALL
  SELECT 1, NULL, DATE '2017-08-05', STRUCT('Value2' AS Value1, 'Value3' AS Value2) UNION ALL
  SELECT 2, 'Jones', DATE '2017-08-02', STRUCT(NULL AS Value1, NULL AS Value2)
)
SELECT
  ID,
  ARRAY_AGG(Name IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Name,
  ARRAY_AGG(Custom.Value1 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value1,
  ARRAY_AGG(Custom.Value2 IGNORE NULLS ORDER BY Inserted_at LIMIT 1)[OFFSET(0)] AS Value2
FROM YourTable
GROUP BY ID;

答案 1 :(得分:1)

您可以使用first_value()

select distinct id,
       first_value(name) over
           (partition by id
            order by (case when name is not null then 1 else 2 end, inserted_at desc)
           ) as name,
       max(inserted_at) as inserted_at,
       first_value(Custom.Value1) over
           (partition by id
            order by (case when Custom.Value1 is not null then 1 else 2 end, inserted_at desc)
           ) as Value1,
       first_value(Custom.Value2) over
           (partition by id
            order by (case when Custom.Value2 is not null then 1 else 2 end, inserted_at desc)
           ) as Value2
from t;