Question

我的目标是获取由id排序的一组数据，并返回一个结果集，该结果集指示val列相同的连续行数。例如。鉴于此数据：

| id | val |
|  1 |  33 |
|  2 |  33 |
|  3 |  44 |
|  4 |  28 |
|  5 |  44 |
|  6 |  44 |

我希望看到这个结果：

| id | val | run_length |
| 1  | 33  | 2          |
| 3  | 44  | 1          |
| 4  | 28  | 1          |
| 5  | 44  | 2          |

结果集中的id列是可选的。事实上，如果它使它变得更加困难，那么只需将该列留在结果之外。我有点喜欢它，因为它＆＃34;引脚＆＃34;结果集到表中的特定位置。

我主要对免费数据库引擎的结果感兴趣。我对解决方案的偏好顺序是：

SQLite
Postgres
MySQL
Oracle
SQL Server
的Sybase

Answer 1

我会在你的列表中选择＃2，因为使用单个查询在SQLite中做到非常痛苦。以下是标准SQL：

select min(id), val, count(*) as runlength
from (select t.*,
             (row_number() over (order by id) -
              row_number() over (partition by val order by id
             ) as grp
      from data t
     ) t
group by grp, val;

这使用两个行数计算的差异来识别相同值的相关性。它应该适用于最新版本的数据库2,4,5和6。

Answer 2

我一直在SQLITE的RLE空间里闲逛，并且遇到过这篇文章。我相信这段代码适用于＃1。第一个答案是正确的，这在SQLite中作为单个查询有点痛苦。

create table example (id integer primary key autoincrement, val integer);

insert into example (val) values (33);
insert into example (val) values (33);
insert into example (val) values (44);
insert into example (val) values (28);
insert into example (val) values (44);
insert into example (val) values (44);


select ren.low_id, e2.val, (ren.high_id - ren.low_id)+1
from example e2
inner join (
select min(hb.low_id) as low_id, hb.high_id as high_id
from 
(
    with nexample(low_id, high_id, val) 
    as 
    (
    select e.id, e.id, e.val from example e
    union all
    select ne.low_id, eu.id, ne.val 
    from nexample ne
    inner join example eu on eu.id = ne.high_id+1 AND eu.val=ne.val
    )
    select ne.low_id, max(ne.high_id) as high_id from nexample ne
    group by ne.low_id
) hb
group by hb.high_id
) ren on ren.low_id = e2.id;

输出：

1|33|2
3|44|1
4|28|1
5|44|2

注意这个解决方案在非常稀疏的集合上表现不佳......我正在寻找一种处理稀疏集合的替代方法。

例如，在一组10000行中，其val值设置为[0,1]但所有值都为0.此代码需要大约2分30秒才能在我的硬件上运行。不太好。

对运行长度或连续相同值编码的SQL查询

2 个答案: