此问题与this之一有关。我有一个包含设备功率值的表,我需要计算给定时间跨度的功耗并返回10个最耗电的设备。我已生成192个设备和7742208个测量记录(每个40324)。这大约是一个月内设备会产生多少记录。
对于这个数据量,我当前的查询需要40多秒来执行,这太多了,因为时间跨度和设备和测量的数量可能要高得多。我是否应该尝试使用不同于滞后的方法来解决这个问题()OVER PARTITION以及可以进行哪些其他优化?我非常感谢有关代码示例的建议。
PostgreSQL版本9.4
使用示例值查询:
SELECT
t.device_id,
sum(len_y*(extract(epoch from len_x))) AS total_consumption
FROM (
SELECT
m.id,
m.device_id,
m.power_total,
m.created_at,
m.power_total+lag(m.power_total) OVER (
PARTITION BY device_id
ORDER BY m.created_at
) AS len_y,
m.created_at-lag(m.created_at) OVER (
PARTITION BY device_id
ORDER BY m.created_at
) AS len_x
FROM
measurements AS m
WHERE m.created_at BETWEEN '2015-07-30 13:05:24.403552+00'::timestamp
AND '2015-08-27 12:34:59.826837+00'::timestamp
) AS t
GROUP BY t.device_id
ORDER BY total_consumption
DESC LIMIT 10;
表格信息:
Column | Type | Modifiers
--------------+--------------------------+----------------------------------------------------------
id | integer | not null default nextval('measurements_id_seq'::regclass)
created_at | timestamp with time zone | default timezone('utc'::text, now())
power_total | real |
device_id | integer | not null
Indexes:
"measurements_pkey" PRIMARY KEY, btree (id)
"measurements_device_id_idx" btree (device_id)
"measurements_created_at_idx" btree (created_at)
Foreign-key constraints:
"measurements_device_id_fkey" FOREIGN KEY (device_id) REFERENCES devices(id)
查询计划:
Limit (cost=1317403.25..1317403.27 rows=10 width=24) (actual time=41077.091..41077.094 rows=10 loops=1)
-> Sort (cost=1317403.25..1317403.73 rows=192 width=24) (actual time=41077.089..41077.092 rows=10 loops=1)
Sort Key: (sum((((m.power_total + lag(m.power_total) OVER (?))) * date_part('epoch'::text, ((m.created_at - lag(m.created_at) OVER (?)))))))
Sort Method: top-N heapsort Memory: 25kB
-> GroupAggregate (cost=1041700.67..1317399.10 rows=192 width=24) (actual time=25361.013..41076.562 rows=192 loops=1)
Group Key: m.device_id
-> WindowAgg (cost=1041700.67..1201314.44 rows=5804137 width=20) (actual time=25291.797..37839.727 rows=7742208 loops=1)
-> Sort (cost=1041700.67..1056211.02 rows=5804137 width=20) (actual time=25291.746..30699.993 rows=7742208 loops=1)
Sort Key: m.device_id, m.created_at
Sort Method: external merge Disk: 257344kB
-> Seq Scan on measurements m (cost=0.00..151582.05 rows=5804137 width=20) (actual time=0.333..5112.851 rows=7742208 loops=1)
Filter: ((created_at >= '2015-07-30 13:05:24.403552'::timestamp without time zone) AND (created_at <= '2015-08-27 12:34:59.826837'::timestamp without time zone))
Planning time: 0.351 ms
Execution time: 41114.883 ms
查询生成测试表和数据:
CREATE TABLE measurements (
id serial primary key,
device_id integer,
power_total real,
created_at timestamp
);
INSERT INTO measurements(
device_id,
created_at,
power_total
)
SELECT
device_id,
now() + (i * interval '1 minute'),
random()*(50-1)+1
FROM (
SELECT
DISTINCT(device_id),
generate_series(0,10) AS i
FROM (
SELECT
generate_series(1,5) AS device_id
) AS dev_ids
) AS gen_table;
答案 0 :(得分:1)
我会尝试将部分计算移动到行插入阶段。
添加新列:
alter table measurements add consumption real;
更新专栏:
with m1 as (
select
id, power_total, created_at,
lag(power_total) over (partition by device_id order by created_at) prev_power_total,
lag(created_at) over (partition by device_id order by created_at) prev_created_at
from measurements
)
update measurements m2
set consumption =
(m1.power_total+ m1.prev_power_total)*
extract(epoch from m1.created_at- m1.prev_created_at)
from m1
where m2.id = m1.id;
创建一个触发器:
create or replace function before_insert_on_measurements()
returns trigger language plpgsql
as $$
declare
rec record;
begin
select power_total, created_at into rec
from measurements
where device_id = new.device_id
order by created_at desc
limit 1;
new.consumption:=
(new.power_total+ rec.power_total)*
extract(epoch from new.created_at- rec.created_at);
return new;
end $$;
create trigger before_insert_on_measurements
before insert on measurements
for each row execute procedure before_insert_on_measurements();
查询:
select device_id, sum(consumption) total_consumption
from measurements
-- where conditions
group by 1
order by 1
答案 1 :(得分:0)