使用窗口函数减少实现数据范围的行数

时间:2016-06-20 14:20:31

标签: postgresql postgresql-9.3 window-functions

我想减少表格中的行数,这些行具有大量重复数据。我的第一个想法是使用一些窗口函数来定义要保存在表中的日期范围,这样每当我需要这些信息时,日期范围只是连接条件中的分隔符。 但后来我注意到有些引用是重叠的,所以,我不确定哪种引用是最好的方法。

我使用的是Postgres 9.3。

select  distinct    
    min(obs_date) over (partition by equipment, temperature) as beg_obs_date,
    max(obs_date) over (partition by equipment, temperature) as end_obs_date,
    equipment, 
    temperature
from    
(   select  generate_series('2016-05-01', '2016-05-08', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
    union all   
    select  generate_series('2016-05-09', '2016-05-15', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -20.00::real as temperature
    union all

    select  generate_series('2016-05-16', '2016-06-10', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
) sq

我明白了:

beg_obs_date    end_obs_date    equipment   temperature
2016-05-01      2016-06-10      FREEZER_1   -15,2
2016-05-09      2016-05-15      FREEZER_1   -20

我想要的是:

beg_obs_date    end_obs_date    equipment   temperature
2016-05-01      2016-05-08      FREEZER_1   -15,2
2016-05-09      2016-05-15      FREEZER_1   -20
2016-05-16      2016-06-10      FREEZER_1   -15,2

有什么想法吗?

谢谢!

1 个答案:

答案 0 :(得分:1)

使用row_number()区分连续系列。添加数据包的数据(略微简化):

with the_data as (
    select  generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
    union all   
    select  generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -20.00::real as temperature
    union all
    select  generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
    )
select 
    *, 
    row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
from the_data

  obs_date  | equipment | temperature | packet 
------------+-----------+-------------+--------
 2016-05-01 | FREEZER_1 |       -15.2 |      0
 2016-05-02 | FREEZER_1 |       -15.2 |      0
 2016-05-03 | FREEZER_1 |       -15.2 |      0
 2016-05-04 | FREEZER_1 |         -20 |     -3
 2016-05-05 | FREEZER_1 |         -20 |     -3
 2016-05-06 | FREEZER_1 |       -15.2 |     -2
 2016-05-07 | FREEZER_1 |       -15.2 |     -2
 2016-05-08 | FREEZER_1 |       -15.2 |     -2
(8 rows)

max()min()中使用packet代替temperature

with the_data as (
    select  generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
    union all   
    select  generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -20.00::real as temperature
    union all
    select  generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
        'FREEZER_1'::varchar as equipment,
        -15.20::real as temperature
    )
select distinct    
    min(obs_date) over (partition by equipment, packet) as beg_obs_date,
    max(obs_date) over (partition by equipment, packet) as end_obs_date,
    equipment, 
    temperature
from (
    select 
        *, 
        row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
    from the_data
) s
order by 1;

 beg_obs_date | end_obs_date | equipment | temperature 
--------------+--------------+-----------+-------------
 2016-05-01   | 2016-05-03   | FREEZER_1 |       -15.2
 2016-05-04   | 2016-05-05   | FREEZER_1 |         -20
 2016-05-06   | 2016-05-08   | FREEZER_1 |       -15.2
(3 rows)