Question

出于演示目的，假设我在Redshift中有一个大表（十亿行+），有两个字段： id和win。 win可以是0或1。

是否有一种有效的方法来计算匹配以下类型的获胜序列的次数：1000？换句话说，如果表包含这些数据：

+-----+-----+
| id  | win |
+-----+-----+
|  0  |  0  |
|  1  |  1  |
|  2  |  0  |
|  3  |  1  |
|  4  |  0  |
|  5  |  0  |
|  6  |  0  |
|  7  |  1  |
+-----+-----+

查询将返回1.

我想这个问题可以在 PostgreSQL 中解答，也可能在SQL中解答感谢。

Answer 1

一种方法使用lag()或lead()：

select t.*
from (select t.*, 
             lead(win, 1) over (order by id) as win_1,
             lead(win, 2) over (order by id) as win_2,
             lead(win, 3) over (order by id) as win_3
      from t
     ) t
where win = 1 and win_1 = 0 and win_2 = 0 and win_3 = 0;

我认为Postgres将有效地使用(id, win)上的索引。但是在数十亿行上，这不会很快。

如何计算Redshift / Postgresql中匹配模式的实例数

1 个答案: