我确实有两个相当大的表,我需要在它们之间进行日期范围联接。不幸的是,查询需要12个小时以上。我正在使用在docker max中运行的postgresql 10.5。 5GB的RAM和多达12个CPU内核。
基本上在左表中,我有一个设备ID和一个日期范围列表(从=时间戳,到= ValidUntil)。然后,我想加入右边的表,该表具有所有设备的测量值(传感器数据),因此我只能获得日期范围之一(来自左表)中的传感器数据。查询:
select
A.*,
B."Timestamp" as "PressureTimestamp",
B."PropertyValue" as "Pressure"
from A
inner join B
on B."EquipmentId" = A."EquipmentId"
and B."Timestamp" >= A."Timestamp"
and B."Timestamp" < A."ValidUntil"
不幸的是,此查询仅使用一个内核,这可能是其运行速度如此缓慢的原因。有没有一种方法可以重写查询以便可以并行化?
索引:
create index if not exists A_eq_timestamp_validUntil on public.A using btree ("EquipmentId", "Timestamp", "ValidUntil");
create index if not exists B_eq_timestamp on public.B using btree ("EquipmentId", "Timestamp");
表格:
-- contains 332,000 rows
CREATE TABLE A (
"EquipmentId" bigint,
"Timestamp" timestamp without time zone,
"ValidUntil" timestamp without time zone
)
WITH ( OIDS = FALSE )
-- contains 70,000,000 rows
CREATE TABLE B
(
"EquipmentId" bigint,
"Timestamp" timestamp without time zone,
"PropertyValue" double precision
)
WITH ( OIDS = FALSE )
执行计划(说明...输出):
Nested Loop (cost=176853.59..59023908.95 rows=941684055 width=48)
-> Bitmap Heap Scan on v2_pressure p (cost=176853.16..805789.35 rows=9448335 width=24)
Recheck Cond: ("EquipmentId" = 2956235)
-> Bitmap Index Scan on v2_pressure_eq (cost=0.00..174491.08 rows=9448335 width=0)
Index Cond: ("EquipmentId" = 2956235)"
-> Index Scan using v2_prs_eq_timestamp_validuntil on v2_prs prs (cost=0.42..5.16 rows=100 width=32)
Index Cond: (("EquipmentId" = 2956235) AND (p."Timestamp" >= "Timestamp") AND (p."Timestamp" < "ValidUntil"))
更新1: 根据评论修复了索引,大大提高了性能
答案 0 :(得分:1)
索引校正是解决速度缓慢的第一方法,但只能在一定程度上有所帮助。鉴于您的表很大,我建议您尝试使用Postgres Partition。它具有来自postgres的一些内置支持。
但是您需要具有一些过滤器/分区标准。我没有在查询中看到任何where子句,因此无法建议。也许您可以尝试equipmentId。这也可以帮助实现并行性。
答案 1 :(得分:0)
-- \i tmp.sql
CREATE TABLE A
( equipmentid bigint NOT NULL
, ztimestamp timestamp without time zone NOT NULL
, validuntil timestamp without time zone NOT NULL
, PRIMARY KEY (equipmentid,ztimestamp)
, UNIQUE (equipmentid,validuntil) -- mustbeunique, since the intervals dont overlap
) ;
-- contains 70,000,000 rows
CREATE TABLE B
( equipmentid bigint NOT NULL
, ztimestamp timestamp without time zone NOT NULL
, propertyvalue double precision
, PRIMARY KEY (equipmentid,ztimestamp)
) ;
INSERT INTO B(equipmentid,ztimestamp,propertyvalue)
SELECT i,t, random()
FROM generate_series(1,1000) i
CROSS JOIN generate_series('2018-09-01','2018-09-30','1day'::interval) t
;
INSERT INTO A(equipmentid,ztimestamp,validuntil)
SELECT equipmentid,ztimestamp, ztimestamp+ '7 days'::interval
FROM B
WHERE date_part('dow', ztimestamp) =0
;
ANALYZE A;
ANALYZE B;
EXPLAIN
SELECT
A.*,
B.ztimestamp AS pressuretimestamp,
B.propertyvalue AS pressure
FROM A
INNER JOIN B
ON B.equipmentid = A.equipmentid
AND B.ztimestamp >= A.ztimestamp
AND B.ztimestamp < A.validuntil
WHERE A.equipmentid=333 -- I added this, the plan in the question also has a r estriction on Id
;
以及最终的计划:
SET
ANALYZE
ANALYZE
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.34..21.26 rows=17 width=40)
-> Index Scan using a_equipmentid_validuntil_key on a (cost=0.17..4.34 rows=5 width=24)
Index Cond: (equipmentid = 333)
-> Index Scan using b_pkey on b (cost=0.17..3.37 rows=3 width=24)
Index Cond: ((equipmentid = 333) AND (ztimestamp >= a.ztimestamp) AND (ztimestamp < a.validuntil))
(5 rowSET
这与我当前的设置random_page_cost=1.1;
将其设置为4.0后,我将获得与OP相同的计划:
SET
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=35.13..54561.69 rows=1416136 width=40) (actual time=1.391..1862.275 rows=225540 loops=1)
-> Bitmap Heap Scan on aa2 (cost=34.71..223.52 rows=1345 width=24) (actual time=1.173..5.223 rows=1345 loops=1)
Recheck Cond: (equipmentid = 5)
Heap Blocks: exact=9
-> Bitmap Index Scan on aa2_equipmentid_validuntil_key (cost=0.00..34.38 rows=1345 width=0) (actual time=1.047..1.048 rows=1345 loops=1)
Index Cond: (equipmentid = 5)
-> Index Scan using bb2_pkey on bb2 (cost=0.42..29.87 rows=1053 width=24) (actual time=0.109..0.757 rows=168 loops=1345)
Index Cond: ((equipmentid = 5) AND (ztimestamp >= aa2.ztimestamp) AND (ztimestamp < aa2.validuntil))
Planning Time: 3.167 ms
Execution Time: 2168.967 ms
(10 rows)