我有一段时间内的帐户余额清单。模式如下:
+-------------+---------+---------+----------------------+
| customer_id | city_id | value | timestamp |
+-------------+---------+---------+----------------------+
| 1 | 1 | -500 | 2019-02-12T00:00:00 |
| 2 | 1 | -200 | 2019-02-12T00:00:00 |
| 3 | 2 | 200 | 2019-02-10T00:00:00 |
| 4 | 1 | -10 | 2019-02-09T00:00:00 |
+-------------+ --------+---------+----------------------+
我想汇总这些数据,以便获得按城市划分并按时间排序的每日负总帐户余额:
+---------+---------+--------------+
| city_id | value | timestamp |
+---------+---------+--------------+
| 1 | -500 | 2019-02-12 |
| 1 | -200 | 2019-02-10 |
| 1 | -10 | 2019-02-09 |
+ --------+---------+--------------+
我尝试过的事情:
SELECT city_id, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as date,
SUM(value) OVER (PARTITION BY city_id ORDER BY FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp))) negative_account_balance
FROM `account_balances`
WHERE value < 0
但是,这给了我奇怪的帐户余额值,例如-5.985856421224E10
。有什么想法吗?除此之外,该查询还会多次生成同一城市和同一天的条目。我希望它在同一天只能返回一个城市。
答案 0 :(得分:2)
以下是用于BigQuery标准SQL
#standardSQL
SELECT city_id, account_balance, `date` FROM (
SELECT city_id, `date`,
SUM(value) OVER(PARTITION BY city_id ORDER BY `date`) account_balance
FROM (
SELECT city_id, DATE(TIMESTAMP(t.timestamp)) AS `date`, SUM(value) value
FROM `project.dataset.account_balances` t
GROUP BY city_id, `date` )
)
WHERE account_balance< 0
您可以使用示例/虚拟数据来测试,玩游戏,如以下示例所示
#standardSQL
WITH `project.dataset.account_balances` AS (
SELECT 1 customer_id, 1 city_id, -500 value, '2019-02-12T00:00:00' `timestamp` UNION ALL
SELECT 2, 1, -200, '2019-02-12T00:00:00' UNION ALL
SELECT 5, 1, 100, '2019-02-13T00:00:00' UNION ALL
SELECT 3, 2, 200, '2019-02-10T00:00:00' UNION ALL
SELECT 4, 1, -10, '2019-02-09T00:00:00'
)
SELECT city_id, account_balance, `date` FROM (
SELECT city_id, `date`,
SUM(value) OVER(PARTITION BY city_id ORDER BY `date`) account_balance
FROM (
SELECT city_id, DATE(TIMESTAMP(t.timestamp)) AS `date`, SUM(value) value
FROM `project.dataset.account_balances` t
GROUP BY city_id, `date` )
)
WHERE account_balance< 0
产生以下结果
Row city_id account_balance date
1 1 -10 2019-02-09
2 1 -710 2019-02-12
3 1 -610 2019-02-13
答案 1 :(得分:1)
我采用了一种更简单的方法并使用了此sql(顺便说一句,当我尝试您的原始查询时,我得到的结果似乎还可以)
SELECT city_id, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as date,
SUM(value) as value
FROM `account_balances`
GROUP BY city_id, timestamp
HAVING value < 0
我使用此数据将其签出(注意:尽管两种方法的结果都相同,但我更改了日期格式以匹配BigQuery格式)
WITH account_balances as (
SELECT 1 AS customer_id, 1 as city_id, -500 as value, '2019-02-12 00:00:00' as timestamp UNION ALL
SELECT 2 AS customer_id, 1 as city_id, -200 as value, '2019-02-12 00:00:00' as timestamp UNION ALL
SELECT 3 AS customer_id, 2 as city_id, 200 as value, '2019-02-10 00:00:00' as timestamp UNION ALL
SELECT 4 AS customer_id, 1 as city_id, -10 as value, '2019-02-09 00:00:00' as timestamp
)
SELECT city_id, FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP(timestamp)) as date,
SUM(value) as value
FROM `account_balances`
GROUP BY city_id, timestamp
HAVING value < 0
这是结果: