使用基于最近日期的非值更新表

时间:2017-12-03 16:05:09

标签: google-bigquery

我有以下表格,其中包含来自BigQuery中GA的数据

userid  visitid purchase_date
GH8932  12345   2017-04-09
GH8932  12346   null
GH8932  12347   null
GH8932  12348   null
GH8932  12349   2017-05-30
GH8932  12350   null
GH8932  12351   null
GH8932  12352   2017-06-07
GH8932  12353   null
GH8932  12354   2017-06-30
GH8932  12355   null
GH8932  12356   null

我想用purchase_date填充所有空值。

我使用的当前查询(如下所示)

SELECT
 userid,
 visitid,
FIRST_VALUE(purchase_date IGNORE NULLS) OVER (
  PARTITION BY userid ORDER BY visitid
  ROWS BETWEEN CURRENT ROW AND
  UNBOUNDED FOLLOWING) AS purchase_date
FROM x;

给我这样的东西

userid  visitid purchase_date
GH8932  12345   2017-04-09
GH8932  12346   2017-05-30
GH8932  12347   2017-05-30
GH8932  12348   2017-05-30
GH8932  12349   2017-05-30
GH8932  12350   2017-06-07
GH8932  12351   2017-06-07
GH8932  12352   2017-06-07
GH8932  12353   2017-06-30
GH8932  12354   2017-06-30
GH8932  12355   null 
GH8932  12356   null

关于如何用最终的purchase_date填充最后2个空值的任何建议?

1 个答案:

答案 0 :(得分:1)

以下是BigQuery Standard SQL

  
#standardSQL
SELECT
  userid,
  visitid,
  IFNULL(FIRST_VALUE(purchase_date IGNORE NULLS) 
    OVER (PARTITION BY userid ORDER BY visitid
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
  FIRST_VALUE(purchase_date IGNORE NULLS) 
    OVER (PARTITION BY userid ORDER BY visitid DESC
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)) AS purchase_date
FROM `project.dataset.table`

您可以使用问题中的虚拟数据进行上述测试/播放

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'GH8932' userid, 12345 visitid, '2017-04-09' purchase_date UNION ALL
  SELECT 'GH8932', 12346, NULL UNION ALL
  SELECT 'GH8932', 12347, NULL UNION ALL
  SELECT 'GH8932', 12348, NULL UNION ALL
  SELECT 'GH8932', 12349, '2017-05-30' UNION ALL
  SELECT 'GH8932', 12350, NULL UNION ALL
  SELECT 'GH8932', 12351, NULL UNION ALL
  SELECT 'GH8932', 12352, '2017-06-07' UNION ALL
  SELECT 'GH8932', 12353, NULL UNION ALL
  SELECT 'GH8932', 12354, '2017-06-30' UNION ALL
  SELECT 'GH8932', 12355, NULL UNION ALL
  SELECT 'GH8932', 12356, NULL 
)
SELECT
  userid,
  visitid,
  IFNULL(FIRST_VALUE(purchase_date IGNORE NULLS) 
    OVER (PARTITION BY userid ORDER BY visitid
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
  FIRST_VALUE(purchase_date IGNORE NULLS) 
    OVER (PARTITION BY userid ORDER BY visitid DESC
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)) AS purchase_date
FROM `project.dataset.table`
ORDER BY userid, visitid