平均难以定义分区

时间:2012-11-05 18:05:32

标签: sql postgresql null aggregate-functions window-functions

我有这张桌子:

create table t (value int, dt date);

 value |     dt     
-------+------------
    10 | 2012-10-30
    15 | 2012-10-29
  null | 2012-10-28
  null | 2012-10-27
     7 | 2012-10-26

我想要这个输出:

 value |     dt     
-------+------------
    10 | 2012-10-30
     5 | 2012-10-29
     5 | 2012-10-28
     5 | 2012-10-27
     7 | 2012-10-26

我希望当按降序排序表时,将空值以及前一个非空值替换为先前非空值的平均值。在此示例中,值15是接下来的两个空值的先前非空值。所以15/3 = 5。

SQL Fiddle

2 个答案:

答案 0 :(得分:4)

我找到了一个非常简单的解决方案:

SELECT max(value) OVER (PARTITION BY grp)
      / count(*)  OVER (PARTITION BY grp) AS value
      ,dt
FROM   (
   SELECT *, count(value) OVER (ORDER BY dt DESC) AS grp
   FROM   t
   ) a;

-> sqlfiddle

由于count()忽略NULL值,您可以使用运行计数(窗口函数中的默认值)快速对值进行分组( - > grp)。

每个组都有一个非空值,因此我们可以使用min / max / sum在另一个窗口函数中获得相同的结果。在count(*)中除以成员数量(NULL这次,以计算grp值!),我们就完成了。

答案 1 :(得分:1)

作为一个谜题,这是一个解决方案......在实践中,它可能会根据您的数据的性质而可怕地执行。无论如何,请观察您的索引:

create database tmp;
create table t (value float, dt date); -- if you use int, you need to care about rounding
insert into t values (10, '2012-10-30'), (15, '2012-10-29'), (null, '2012-10-28'), (null, '2012-10-27'), (7, '2012-10-26');

select t1.dt, t1.value, t2.dt, t2.value, count(*) cnt 
from t t1, t t2, t t3 
where 
    t2.dt >= t1.dt and t2.value is not null 
    and not exists (
        select * 
        from t 
        where t.dt < t2.dt and t.dt >= t1.dt and t.value is not null
    ) 
    and t3.dt <= t2.dt 
    and not exists (
        select * 
        from t where t.dt >= t3.dt and t.dt < t2.dt and t.value is not null
    ) 
group by t1.dt;

+------------+-------+------------+-------+-----+
| dt         | value | dt         | value | cnt |
+------------+-------+------------+-------+-----+
| 2012-10-26 |     7 | 2012-10-26 |     7 |   1 |
| 2012-10-27 |  NULL | 2012-10-29 |    15 |   3 |
| 2012-10-28 |  NULL | 2012-10-29 |    15 |   3 |
| 2012-10-29 |    15 | 2012-10-29 |    15 |   3 |
| 2012-10-30 |    10 | 2012-10-30 |    10 |   1 |
+------------+-------+------------+-------+-----+
5 rows in set (0.00 sec)

select dt, value/cnt 
from (
    select t1.dt , t2.value, count(*) cnt 
    from t t1, t t2, t t3 
    where 
        t2.dt >= t1.dt and t2.value is not null 
        and not exists (
            select * 
            from t 
            where t.dt < t2.dt and t.dt >= t1.dt and t.value is not null
        ) 
    and t3.dt <= t2.dt 
    and not exists (
        select * 
        from t 
        where t.dt >= t3.dt and t.dt < t2.dt and t.value is not null
    ) 
    group by t1.dt
) x;

+------------+-----------+
| dt         | value/cnt |
+------------+-----------+
| 2012-10-26 |         7 |
| 2012-10-27 |         5 |
| 2012-10-28 |         5 |
| 2012-10-29 |         5 |
| 2012-10-30 |        10 |
+------------+-----------+
5 rows in set (0.00 sec)

说明:

  • t1是原始表
  • t2是表中具有非空值的最小日期
  • 的行
  • t3是介于两者之间的所有行,因此我们可以将其他行分组并计算

抱歉,我不能更清楚。这对我来说也很困惑: - )