在我之前的问题中有一个相当大的错误
select earliest date from multiple rows
horse_with_no_name的答案会返回一个完美的结果,我非常感激,但是我自己的初步问题错了,所以我真的很道歉;如果你看下面的表格;
circuit_uid |customer_name |rack_location |reading_date | reading_time | amps | volts | kw | kwh | kva | pf | key -------------------------------------------------------------------------------------------------------------------------------------- cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 87 | 1.03 | 0.85 | 15 cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 90 | 0.96 | 0.84 | 16 cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 21 | 1.03 | 0.85 | 15 cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 23 | 0.96 | 0.84 | 16 cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 179 | 1.67 | 0.88 | 24009 cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 182 | 2.07 | 0.85 | 24010 cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 121 | 1.67 | 0.88 | 24009 cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 124 | 2.07 | 0.85 | 24010 cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 223 | 1.68 | 0.89 | 48003 cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 226 | 1.51 | 0.88 | 48004 cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 215 | 1.68 | 0.89 | 48003 cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 217 | 1.51 | 0.88 | 48004
正如您所看到的,每个客户现在都有多个电路。因此,结果现在是每个客户每个电路的每个最早kwh读数的总和,因此该表中的结果将是;
customer_name | kwh(sum)
--------------+-----------
customer 1 | 108 (the result of 87 + 21)
customer 2 | 300 (the result of 179 + 121)
customer 3 | 438 (the result of 223 + 215)
每个客户将有超过2个电路,读数可能会在不同时间发生,因此需要“最早”读数。
是否有人对修订后的问题有任何建议?
CentOs / Redhat上的PostgreSQL 8.4。
答案 0 :(得分:2)
SELECT customer_name, sum(kwh) AS kwh_total
FROM (
SELECT DISTINCT ON (customer_name, circuit_uid)
customer_name, circuit_uid, kwh
FROM readings
WHERE reading_date = '2012-01-02'::date
ORDER BY customer_name, circuit_uid, reading_time
) x
GROUP BY 1
与before相同,只需按(customer_name, circuit_uid)
选择最早的。{
然后按customer_name
求和。
如下所示的multi-column index会使非常快:
CREATE INDEX readings_multi_idx
ON readings(reading_date, customer_name, circuit_uid, reading_time);
答案 1 :(得分:1)
这是对原始问题的扩展:
select customer_name,
sum(kwh)
from (
select customer_name,
kwh,
reading_time,
reading_date,
row_number() over (partition by customer_name, circuit_uid order by reading_time) as rn
from readings
where reading_date = date '2012-01-02'
) t
where rn = 1
group by customer_name
请注意外部查询中的新sum()
和内部查询中的更改partition by
定义(与之前的问题相比),它现在计算每个circuit_uid
的第一个读数(而不是每个客户的第一个。)