我想减少表格中的行数,这些行具有大量重复数据。我的第一个想法是使用一些窗口函数来定义要保存在表中的日期范围,这样每当我需要这些信息时,日期范围只是连接条件中的分隔符。 但后来我注意到有些引用是重叠的,所以,我不确定哪种引用是最好的方法。
我使用的是Postgres 9.3。
select distinct
min(obs_date) over (partition by equipment, temperature) as beg_obs_date,
max(obs_date) over (partition by equipment, temperature) as end_obs_date,
equipment,
temperature
from
( select generate_series('2016-05-01', '2016-05-08', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
union all
select generate_series('2016-05-09', '2016-05-15', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-20.00::real as temperature
union all
select generate_series('2016-05-16', '2016-06-10', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
) sq
我明白了:
beg_obs_date end_obs_date equipment temperature
2016-05-01 2016-06-10 FREEZER_1 -15,2
2016-05-09 2016-05-15 FREEZER_1 -20
我想要的是:
beg_obs_date end_obs_date equipment temperature
2016-05-01 2016-05-08 FREEZER_1 -15,2
2016-05-09 2016-05-15 FREEZER_1 -20
2016-05-16 2016-06-10 FREEZER_1 -15,2
有什么想法吗?
谢谢!
答案 0 :(得分:1)
使用row_number()
区分连续系列。添加数据包的数据(略微简化):
with the_data as (
select generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
union all
select generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-20.00::real as temperature
union all
select generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
)
select
*,
row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
from the_data
obs_date | equipment | temperature | packet
------------+-----------+-------------+--------
2016-05-01 | FREEZER_1 | -15.2 | 0
2016-05-02 | FREEZER_1 | -15.2 | 0
2016-05-03 | FREEZER_1 | -15.2 | 0
2016-05-04 | FREEZER_1 | -20 | -3
2016-05-05 | FREEZER_1 | -20 | -3
2016-05-06 | FREEZER_1 | -15.2 | -2
2016-05-07 | FREEZER_1 | -15.2 | -2
2016-05-08 | FREEZER_1 | -15.2 | -2
(8 rows)
在max()
和min()
中使用packet
代替temperature
:
with the_data as (
select generate_series('2016-05-01', '2016-05-03', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
union all
select generate_series('2016-05-04', '2016-05-05', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-20.00::real as temperature
union all
select generate_series('2016-05-06', '2016-05-08', '1 day'::interval)::date as obs_date,
'FREEZER_1'::varchar as equipment,
-15.20::real as temperature
)
select distinct
min(obs_date) over (partition by equipment, packet) as beg_obs_date,
max(obs_date) over (partition by equipment, packet) as end_obs_date,
equipment,
temperature
from (
select
*,
row_number() over (partition by equipment, temperature order by obs_date)- row_number() over (order by obs_date) as packet
from the_data
) s
order by 1;
beg_obs_date | end_obs_date | equipment | temperature
--------------+--------------+-----------+-------------
2016-05-01 | 2016-05-03 | FREEZER_1 | -15.2
2016-05-04 | 2016-05-05 | FREEZER_1 | -20
2016-05-06 | 2016-05-08 | FREEZER_1 | -15.2
(3 rows)