根据两个参数选择值:另一个列字符串值和BigQuery的max(date)

时间:2018-02-16 18:15:33

标签: sql google-bigquery

我有下表:

   minute   values
0   1        3
             .30
1   2        4
             .40
2   1        1
             .10
3   4        6
             .60

如何在AccountID Email status_update value date (dd/mm/yyyy) 123456 foo@gmail.com state1 19 02/02/2016 123456 foo@gmail.com state2 20 10/10/2014 123456 foo@gmail.com state2 35 15/10/2015 456123 bar@gmail.com state2 45 05/04/2017 789123 foobar@gmail.com state2 10 22/04/2016 789123 foobar@gmail.com state1 22 17/06/2018 456345 cool@gmail.com state1 10 13/08/2017 456345 cool@gmail.com state2 05 09/07/2015 456345 cool@gmail.com state2 17 09/07/2014 时返回最早的值?

例如,查看我的status_update = state2表格,我想获得foo@gmail.com。对于此帐户,20的最早价值(10/10/2014)是多少。

正如您可能猜到的,我不能在此处使用简单的state2语句,因为这是更全局查询的一部分,其中最终输出将按AccountID和电子邮件进行分组。

基本上我希望能够做到这一点

where status_update = state2

或类似的东西:

ARRAY_AGG(value WHERE status_update = state2 ORDER BY date ASC LIMIT 1)[OFFSET (0)] as Account_Status

希望我的问题很明确。感谢。

3 个答案:

答案 0 :(得分:2)

这应该有效:

ARRAY_AGG(IF(status_update = state2, value, NULL)
          IGNORE NULLS ORDER BY date ASC LIMIT 1)[OFFSET (0)] as Account_Status

通过它,您可以过滤掉与state2匹配的更新。

答案 1 :(得分:2)

下面是BigQuery Standard SQL(假设您不想进行任何分组,只想根据该帐户中最早的state2行标记每个帐户的帐户状态)

#standardSQL
SELECT *,
  FIRST_VALUE(IF(status_update = 'state2', value, NULL) IGNORE NULLS) 
    OVER(PARTITION BY email, accountid 
      ORDER BY PARSE_DATE('%d/%m/%Y', dt) 
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) account_Status
FROM `project.dataset.table`
-- ORDER BY accountid, email, PARSE_DATE('%d/%m/%Y', dt)

您可以使用您问题中的虚拟数据进行上述测试/播放,如下所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 123456 accountid, 'foo@gmail.com' email, 'state1' status_update, 19 value, '02/02/2016' dt UNION ALL
  SELECT 123456, 'foo@gmail.com', 'state2', 20, '10/10/2014' UNION ALL
  SELECT 123456, 'foo@gmail.com', 'state2', 35, '15/10/2015' UNION ALL
  SELECT 456123, 'bar@gmail.com', 'state2', 45, '05/04/2017' UNION ALL
  SELECT 789123, 'foobar@gmail.com', 'state2', 10, '22/04/2016' UNION ALL
  SELECT 789123, 'foobar@gmail.com', 'state1', 22, '17/06/2018' UNION ALL
  SELECT 456345, 'cool@gmail.com', 'state1', 10, '13/08/2017' UNION ALL
  SELECT 456345, 'cool@gmail.com', 'state2', 05, '09/07/2015' UNION ALL
  SELECT 456345, 'cool@gmail.com', 'state2', 17, '09/07/2014' 
)
SELECT *,
  FIRST_VALUE(IF(status_update = 'state2', value, NULL) IGNORE NULLS) 
    OVER(PARTITION BY email, accountid 
      ORDER BY PARSE_DATE('%d/%m/%Y', dt) 
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) account_Status
FROM `project.dataset.table`
ORDER BY accountid, email, PARSE_DATE('%d/%m/%Y', dt)  

以下结果

Row accountid   email               status_update   value   dt          account_Status   
1   123456      foo@gmail.com       state2          20      10/10/2014  20   
2   123456      foo@gmail.com       state2          35      15/10/2015  20   
3   123456      foo@gmail.com       state1          19      02/02/2016  20   
4   456123      bar@gmail.com       state2          45      05/04/2017  45   
5   456345      cool@gmail.com      state2          17      09/07/2014  17   
6   456345      cool@gmail.com      state2          5       09/07/2015  17   
7   456345      cool@gmail.com      state1          10      13/08/2017  17   
8   789123      foobar@gmail.com    state2          10      22/04/2016  10   
9   789123      foobar@gmail.com    state1          22      17/06/2018  10   

答案 2 :(得分:0)

尝试如下窗口功能。按状态desc和日期asc排序并选择第一行。如果需要,您将获得所需的行并执行更多操作。我没有重新创建样本表并对此进行了测试,但我希望你能得到这个想法

 SELECT
    AccountID,
    value as min_value
   FROM
    (
         SELECT
            AccountID,
            Email,
            status_update,
            value,
            DATE,
            ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY status_update DESC DATE ASC) row_num
           FROM
            <tablename> ) temp_tab
  WHERE
    temp_tab.row_num = 1