在Big Query中使用`Lead`窗口函数时遇到时间戳问题

时间:2015-12-29 22:46:42

标签: google-bigquery sqldatatypes window-functions

我正在尝试获取客户的第一个订单,下一个订单以及两个订单之间的天数差异。看起来很简单。我遵循的步骤如下:

  1. 使用MIN()和LEAD()函数
  2. 拉出客户的第一和第二个订单
  3. 使用这两个字段运行DATEDIFF以获得天数差异。
  4. 简要说明如下:

    SELECT cust, MIN(ord_time) first_ord, LEAD(ord_time, 1) 
                                          OVER 
                                          (PARTITION BY customer_id
                                          ORDER BY ord_time) next_ord
    FROM
    (SELECT cust, ord_time
    FROM df.orders
    GROUP EACH BY cust, ord_time)
    

    其中还有一些其他过滤连接和分组,但这是基本块。

    输出应该是包含客户ID的字段和两个时间戳字段。两个时间戳字段如下所示:

    Timestamps in Output

    所以一切看起来都很棒。但是,当我尝试用两个字段运行DATEDIFF()函数时,一切都回来了。

    此外,当我将鼠标悬停在任一时间戳字段上时,它告诉我数据类型是TIMESTAMP,但是当我尝试将任何类型的时间戳转换运行到秒或其他任何时候,next_ord字段导致它失败,错误为“类型未知“。

    只是寻找任何我做错的事情或以任何方式解决这个问题。

    感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

我认为这与wondow函数如何处理时间戳

有关

这是我到目前为止看到的:

1

当源数据点是字符串时 - 全部用作预期

SELECT 
  customer_id,
  first_ord,
  next_ord,
  DATEDIFF(next_ord, first_ord) AS diff
FROM (
  SELECT 
    customer_id, 
    LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord, 
    LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
  FROM 
    (SELECT 1 AS customer_id, '2014-04-08 09:51:24 UTC' AS ord_time),
    (SELECT 1 AS customer_id, '2014-04-08 09:53:31 UTC' AS ord_time),
    (SELECT 1 AS customer_id, '2014-05-08 09:53:31 UTC' AS ord_time),
    (SELECT 2 AS customer_id, '2014-09-12 17:20:43 UTC' AS ord_time),
    (SELECT 2 AS customer_id, '2015-04-16 21:44:18 UTC' AS ord_time),
)
WHERE num = 1

结果:

customer_id       first_ord             next_ord    diff     
1   2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0    
2   2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216  

2

当源数据点是时间戳时 - 结果是 null ,如您在问题中所述:

SELECT 
  customer_id,
  first_ord,
  next_ord,
  DATEDIFF(next_ord, first_ord) AS diff
FROM (
  SELECT 
    customer_id, 
    LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord, 
    LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
  FROM 
    (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
    (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
    (SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
    (SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
    (SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time),
)
WHERE num = 1

结果:

customer_id       first_ord             next_ord    diff     
1   2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC null     
2   2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC null     

3

要“修复”,我必须按照以下方式进行投射:

SELECT 
  customer_id,
  TIMESTAMP(first_ord) as first_ord,
  TIMESTAMP(next_ord) as next_ord,
  DATEDIFF(next_ord, first_ord) AS diff
FROM (
  SELECT 
    customer_id, 
    LEAD(STRING(ord_time), 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord, 
    LEAD(STRING(ord_time), 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
  FROM 
    (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
    (SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
    (SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
    (SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
    (SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time)
)
WHERE num = 1

结果是:

customer_id       first_ord             next_ord    diff     
1   2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0    
2   2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216