在bigquery

时间:2019-02-20 12:12:56

标签: google-bigquery

我正在玩bigquery,并找到了一个有趣的用例。我有一组客户和帐户余额。帐户余额收集记录任何帐户余额更改。

客户:

+---------+--------+
|    ID   |  Name  |
+---------+--------+
| 1       | Alice  |
| 2       | Bob    |
+---------+--------+

帐户余额:

+---------+---------------+---------+------------+
|    ID   |  customer_id  |  value  | timestamp  |
+---------+---------------+---------+------------+
| 1       | 1             |  -500   | 2019-02-12 |
| 2       | 1             |  -200   | 2019-02-10 |
| 3       | 2             |  200    | 2019-02-10 |
| 4       | 1             |  0      | 2019-02-09 |
+---------+---------------+---------+------------+

目标是找出客户的负余额有多长时间。生成的集合如下所示:

+---------+--------+---------------------------------+
|    ID   |  Name  |  Negative account balance since |
+---------+--------+---------------------------------+
| 1       | Alice  |  2 days                         |
+---------+--------+---------------------------------+

Bob不在集合中,因为他的上一个帐户记录显示为正值。

我认为涉及以下步骤:

  • 获取每个客户的上一个帐户余额,看看是否为负
  • 遍历帐户余额值,直到达到正值(或没有其他值)
  • 计算datediff

在sql中是否可能有类似的事情?您对由谁创建此类查询有任何想法吗?为了获得当前帐户余额为负的客户,我使用以下查询:

SELECT customer_id FROM (
  SELECT t.account_balance, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) as seqnum FROM `account_balances` t
) t
WHERE seqnum = 1 AND account_balance<0

1 个答案:

答案 0 :(得分:2)

以下是用于BigQuery标准SQL

#standardSQL
SELECT customer_id, name,
  SUM(IF(negative_positive < 0, days, 0)) negative_days,
  SUM(IF(negative_positive = 0, days, 0)) zero_days,
  SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
  SELECT customer_id, negative_positive, grp,
    1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
  FROM (
    SELECT customer_id, ts, SIGN(value) negative_positive, 
    COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
    FROM (
      SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
      FROM `project.dataset.balances`
    )
  )
  GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name

您可以使用问题中的示例数据来进行测试,如上示例所示

#standardSQL
WITH `project.dataset.balances` AS (
  SELECT 1 customer_id, -500 value, DATE '2019-02-12' ts UNION ALL
  SELECT 1, -200, '2019-02-10' UNION ALL
  SELECT 2, 200, '2019-02-10' UNION ALL
  SELECT 1, 0, '2019-02-09' 
), `project.dataset.customers` AS (
  SELECT 1 id, 'Alice' name UNION ALL
  SELECT 2, 'Bob' 
)
SELECT customer_id, name,
  SUM(IF(negative_positive < 0, days, 0)) negative_days,
  SUM(IF(negative_positive = 0, days, 0)) zero_days,
  SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
  SELECT customer_id, negative_positive, grp,
    1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
  FROM (
    SELECT customer_id, ts, SIGN(value) negative_positive, 
    COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
    FROM (
      SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
      FROM `project.dataset.balances`
    )
  )
  GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
-- ORDER BY customer_id

有结果

Row customer_id name    negative_days   zero_days   positive_days    
1   1           Alice   3               1           0    
2   2           Bob     0               0           1