我有下表:
minute values
0 1 3
.30
1 2 4
.40
2 1 1
.10
3 4 6
.60
如何在AccountID Email status_update value date (dd/mm/yyyy)
123456 foo@gmail.com state1 19 02/02/2016
123456 foo@gmail.com state2 20 10/10/2014
123456 foo@gmail.com state2 35 15/10/2015
456123 bar@gmail.com state2 45 05/04/2017
789123 foobar@gmail.com state2 10 22/04/2016
789123 foobar@gmail.com state1 22 17/06/2018
456345 cool@gmail.com state1 10 13/08/2017
456345 cool@gmail.com state2 05 09/07/2015
456345 cool@gmail.com state2 17 09/07/2014
时返回最早的值?
例如,查看我的status_update = state2
表格,我想获得foo@gmail.com
。对于此帐户,20
的最早价值(10/10/2014
)是多少。
正如您可能猜到的,我不能在此处使用简单的state2
语句,因为这是更全局查询的一部分,其中最终输出将按AccountID和电子邮件进行分组。
基本上我希望能够做到这一点
where status_update = state2
或类似的东西:
ARRAY_AGG(value WHERE status_update = state2 ORDER BY date ASC LIMIT 1)[OFFSET (0)] as Account_Status
希望我的问题很明确。感谢。
答案 0 :(得分:2)
这应该有效:
ARRAY_AGG(IF(status_update = state2, value, NULL)
IGNORE NULLS ORDER BY date ASC LIMIT 1)[OFFSET (0)] as Account_Status
通过它,您可以过滤掉与state2
匹配的更新。
答案 1 :(得分:2)
下面是BigQuery Standard SQL(假设您不想进行任何分组,只想根据该帐户中最早的state2行标记每个帐户的帐户状态)
#standardSQL
SELECT *,
FIRST_VALUE(IF(status_update = 'state2', value, NULL) IGNORE NULLS)
OVER(PARTITION BY email, accountid
ORDER BY PARSE_DATE('%d/%m/%Y', dt)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) account_Status
FROM `project.dataset.table`
-- ORDER BY accountid, email, PARSE_DATE('%d/%m/%Y', dt)
您可以使用您问题中的虚拟数据进行上述测试/播放,如下所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 123456 accountid, 'foo@gmail.com' email, 'state1' status_update, 19 value, '02/02/2016' dt UNION ALL
SELECT 123456, 'foo@gmail.com', 'state2', 20, '10/10/2014' UNION ALL
SELECT 123456, 'foo@gmail.com', 'state2', 35, '15/10/2015' UNION ALL
SELECT 456123, 'bar@gmail.com', 'state2', 45, '05/04/2017' UNION ALL
SELECT 789123, 'foobar@gmail.com', 'state2', 10, '22/04/2016' UNION ALL
SELECT 789123, 'foobar@gmail.com', 'state1', 22, '17/06/2018' UNION ALL
SELECT 456345, 'cool@gmail.com', 'state1', 10, '13/08/2017' UNION ALL
SELECT 456345, 'cool@gmail.com', 'state2', 05, '09/07/2015' UNION ALL
SELECT 456345, 'cool@gmail.com', 'state2', 17, '09/07/2014'
)
SELECT *,
FIRST_VALUE(IF(status_update = 'state2', value, NULL) IGNORE NULLS)
OVER(PARTITION BY email, accountid
ORDER BY PARSE_DATE('%d/%m/%Y', dt)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) account_Status
FROM `project.dataset.table`
ORDER BY accountid, email, PARSE_DATE('%d/%m/%Y', dt)
以下结果
Row accountid email status_update value dt account_Status
1 123456 foo@gmail.com state2 20 10/10/2014 20
2 123456 foo@gmail.com state2 35 15/10/2015 20
3 123456 foo@gmail.com state1 19 02/02/2016 20
4 456123 bar@gmail.com state2 45 05/04/2017 45
5 456345 cool@gmail.com state2 17 09/07/2014 17
6 456345 cool@gmail.com state2 5 09/07/2015 17
7 456345 cool@gmail.com state1 10 13/08/2017 17
8 789123 foobar@gmail.com state2 10 22/04/2016 10
9 789123 foobar@gmail.com state1 22 17/06/2018 10
答案 2 :(得分:0)
尝试如下窗口功能。按状态desc和日期asc排序并选择第一行。如果需要,您将获得所需的行并执行更多操作。我没有重新创建样本表并对此进行了测试,但我希望你能得到这个想法
SELECT
AccountID,
value as min_value
FROM
(
SELECT
AccountID,
Email,
status_update,
value,
DATE,
ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY status_update DESC DATE ASC) row_num
FROM
<tablename> ) temp_tab
WHERE
temp_tab.row_num = 1