每组多个组的SQL总和

时间:2012-11-17 18:58:20

标签: sql postgresql aggregate-functions greatest-n-per-group

在我之前的问题中有一个相当大的错误

select earliest date from multiple rows

horse_with_no_name的答案会返回一个完美的结果,我非常感激,但是我自己的初步问题错了,所以我真的很道歉;如果你看下面的表格;

circuit_uid  |customer_name     |rack_location  |reading_date   | reading_time | amps | volts  |  kw  | kwh | kva  |  pf  |  key 
--------------------------------------------------------------------------------------------------------------------------------------
cu1.cb1.r1    | Customer 1       | 12.01.a1      | 2012-01-02   | 00:01:01     | 4.51 | 229.32 | 1.03 |  87 | 1.03 | 0.85 |    15
cu1.cb1.r1    | Customer 1       | 12.01.a1      | 2012-01-02   | 01:01:01     | 4.18 | 230.3  | 0.96 |  90 | 0.96 | 0.84 |    16
cu1.cb1.r2    | Customer 1       | 12.01.a1      | 2012-01-02   | 00:01:01     | 4.51 | 229.32 | 1.03 |  21 | 1.03 | 0.85 |    15
cu1.cb1.r2    | Customer 1       | 12.01.a1      | 2012-01-02   | 01:01:01     | 4.18 | 230.3  | 0.96 |  23 | 0.96 | 0.84 |    16
cu1.cb1.s2    | Customer 2       | 10.01.a1      | 2012-01-02   | 00:01:01     | 7.34 | 228.14 | 1.67 | 179 | 1.67 | 0.88 | 24009
cu1.cb1.s2    | Customer 2       | 10.01.a1      | 2012-01-02   | 01:01:01     | 9.07 |  228.4 | 2.07 | 182 | 2.07 | 0.85 | 24010
cu1.cb1.s3    | Customer 2       | 10.01.a1      | 2012-01-02   | 00:01:01     | 7.34 | 228.14 | 1.67 | 121 | 1.67 | 0.88 | 24009
cu1.cb1.s3    | Customer 2       | 10.01.a1      | 2012-01-02   | 01:01:01     | 9.07 |  228.4 | 2.07 | 124 | 2.07 | 0.85 | 24010
cu1.cb1.r1    | Customer 3       | 01.01.a1      | 2012-01-02   | 00:01:01     | 7.32 | 229.01 | 1.68 | 223 | 1.68 | 0.89 | 48003 
cu1.cb1.r1    | Customer 3       | 01.01.a1      | 2012-01-02   | 01:01:01     | 6.61 | 228.29 | 1.51 | 226 | 1.51 | 0.88 | 48004
cu1.cb1.r4    | Customer 3       | 01.01.a1      | 2012-01-02   | 00:01:01     | 7.32 | 229.01 | 1.68 | 215 | 1.68 | 0.89 | 48003 
cu1.cb1.r4    | Customer 3       | 01.01.a1      | 2012-01-02   | 01:01:01     | 6.61 | 228.29 | 1.51 | 217 | 1.51 | 0.88 | 48004

正如您所看到的,每个客户现在都有多个电路。因此,结果现在是每个客户每个电路的每个最早kwh读数的总和,因此该表中的结果将是;

customer_name | kwh(sum)
--------------+-----------
customer 1    | 108      (the result of 87 + 21)  
customer 2    | 300      (the result of 179 + 121)  
customer 3    | 438      (the result of 223 + 215)   

每个客户将有超过2个电路,读数可能会在不同时间发生,因此需要“最早”读数。

是否有人对修订后的问题有任何建议?

CentOs / Redhat上的PostgreSQL 8.4。

2 个答案:

答案 0 :(得分:2)

SELECT customer_name, sum(kwh) AS kwh_total
FROM  (
    SELECT DISTINCT ON (customer_name, circuit_uid)
           customer_name, circuit_uid, kwh
    FROM   readings
    WHERE  reading_date = '2012-01-02'::date
    ORDER  BY customer_name, circuit_uid, reading_time
    ) x
GROUP  BY 1

before相同,只需按(customer_name, circuit_uid)选择最早的。{ 然后按customer_name求和。

索引

如下所示的multi-column index会使非常快:

CREATE INDEX readings_multi_idx
ON readings(reading_date, customer_name, circuit_uid, reading_time);

答案 1 :(得分:1)

这是对原始问题的扩展:

select customer_name,
       sum(kwh)
from (
   select customer_name,
          kwh,
          reading_time,
          reading_date,
          row_number() over (partition by customer_name, circuit_uid order by reading_time) as rn
   from readings
   where reading_date = date '2012-01-02'
) t
where rn = 1
group by customer_name

请注意外部查询中的新sum()和内部查询中的更改partition by定义(与之前的问题相比),它现在计算每个circuit_uid的第一个读数(而不是每个客户的第一个。)