查找连续模式(使用SQL)

时间:2017-11-06 14:17:33

标签: sql postgresql gaps-and-islands

PostgreSQL中的表import os from os import system for ele in os.listdir(Path): if ele.endswith('.sdf'): chdir(Path + '/' + ele[0:5]) system('cat' + ' ' + '*.sdf' + '>' + ele[0:5] + '.sdf') : 每个consecutive都有一个se_id 从0到100 - 这里是0到9。

搜索模式:

idx

sorce_table

现在我正在寻找这种模式连续出现的最长时间 对于每个SELECT * FROM consecutive WHERE val_3_bool = 1 AND val_1_dur > 4100 AND val_1_dur < 5900 - 以及p_id的{​​{1}}。

result_table

是否可以在纯SQL中计算?

table as txt "Result" as txt

2 个答案:

答案 0 :(得分:2)

一种方法是使用行数方法的差异来获取每个的序列:

select pid, count(*) as in_a_row, sum(val1_dur) as dur
from (select t.*,
             row_number() over (partition by pid order by idx) as seqnum,
             row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
      from consecutive t
     ) t
group by (seqnun - seqnum_d), pid, val3_bool;

如果您正在寻找&#34; 1&#34;值,然后将where val3_bool = 1添加到外部查询。为了理解为什么会这样,我建议你盯着子查询的结果,这样就可以理解为什么差异定义了连续的值。

然后,您可以使用distinct on

获取最大值
select distinct on (pid) t.*
from (select pid, count(*) as in_a_row, sum(val1_dur) as dur
      from (select t.*,
                   row_number() over (partition by pid order by idx) as seqnum,
                   row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
            from consecutive t
           ) t
      group by (seqnun - seqnum_d), pid, val3_bool;
     ) t
order by pid, in_a_row desc;

distinct on不需要额外级别的子查询,但我认为这会使逻辑更清晰。

答案 1 :(得分:0)