Redshift:获取行的排名,按条件过滤

时间:2014-09-09 13:34:19

标签: sql amazon-redshift

每次我在表格中添加一行时,我想知道它与表格相比在哪里排名。使用RANK()窗口函数可以轻松完成此操作。但是,我很难找到一种方法来发现它与表格相比排名的位置,直到过滤了值

作为一个例子,我想最终得到这个非常人为的表格:

    date    |   name  | animal_bought | num_sloths_bought_before | num_camels_bought_before
------------+---------+---------------+--------------------------+--------------------------
 2014-09-01 | Vincent | sloth         | 0                        | 0
 2014-09-01 | Luis    | camel         | 0                        | 0
 2014-09-02 | Vincent | sloth         | 1                        | 0
 2014-09-02 | Luis    | camel         | 0                        | 1
 2014-09-02 | Kevin   | sloth         | 0                        | 0
 2014-09-03 | Vincent | camel         | 1                        | 0
 2014-09-04 | Deo     | camel         | 0                        | 0
 2014-09-04 | Vincent | sloth         | 2                        | 1
 2014-09-05 | Luis    | camel         | 0                        | 2
 2014-09-05 | Andrew  | sloth         | 0                        | 0

我最初想看看是否可以对窗口函数应用过滤器(例如RANK() OVER(PARTITION BY name WHERE animal_bought = 'sloth' ORDER BY date ASC) AS num_sloths_bought_before),但这在语法上并不正确。然后我尝试添加子查询,如下所示:

SELECT
  date,
  name,
  animal_bought,
  ( SELECT
      RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
    FROM this_table
    WHERE animal_bought = 'sloth'
  ) AS num_sloths_bought_before
FROM source_table

但Redshift引发了这个错误:

ERROR:  This type of correlated subquery pattern is not supported yet

我也尝试将窗口函数放在case语句中(抛出相同的错误)并计算连接查询中的排名(无法使其工作)。

1 个答案:

答案 0 :(得分:2)

嗯。我不认为这个查询会做你想做的事情:

SELECT date, name, animal_bought,
       (SELECT RANK() OVER(PARTITION BY name ORDER BY date ASC) - 1
        FROM this_table
        WHERE animal_bought = 'sloth'
       ) AS num_sloths_bought_before
FROM source_table

有几个原因:

  • 使用rank()表示this_table中有多个匹配animal_bought的行。否则,您可以使用聚合函数。
  • 如果只有一行与where子句匹配,则该值始终为1,因为where子句在rank()之前处理。
  • 您的问题只提到一个表,但您的查询有两个

也许您只想要rank()没有子查询?

SELECT date, name, animal_bought,
       RANK() OVER (PARTITION BY name, animal ORDER BY date ASC) - 1 as NumberBoughtBefore
FROM source_table;

如果您想要两种动物,请不要使用rank(),请使用累积总和:

SELECT date, name, animal_bought,
       sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) as SlothsBefore,
       sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) as CamelsBefore
FROM source_table;

编辑:

SELECT date, name, animal_bought,
       (sum(case when animal = 'sloth' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'sloth' then 1 else 0 end)
       ) as SlothsBefore,
       (sum(case when animal = 'camel' then 1 else 0 end) over (partition by name order by date) -
        (case when animal = 'camel' then 1 else 0 end)
       ) as CamelsBefore
FROM source_table;