postgresql-两个大表之间的连接需要很长时间

时间:2018-09-26 06:56:33

标签: sql postgresql performance

我确实有两个相当大的表,我需要在它们之间进行日期范围联接。不幸的是,查询需要12个小时以上。我正在使用在docker max中运行的postgresql 10.5。 5GB的RAM和多达12个CPU内核。

基本上在左表中,我有一个设备ID和一个日期范围列表(从=时间戳,到= ValidUntil)。然后,我想加入右边的表,该表具有所有设备的测量值(传感器数据),因此我只能获得日期范围之一(来自左表)中的传感器数据。查询:

select
    A.*,
    B."Timestamp" as "PressureTimestamp",
    B."PropertyValue" as "Pressure"
from A
inner join B
    on  B."EquipmentId" =  A."EquipmentId"
    and B."Timestamp"   >= A."Timestamp"
    and B."Timestamp"   <  A."ValidUntil"

不幸的是,此查询仅使用一个内核,这可能是其运行速度如此缓慢的原因。有没有一种方法可以重写查询以便可以并行化?

索引:

create index if not exists A_eq_timestamp_validUntil on public.A using btree ("EquipmentId", "Timestamp", "ValidUntil");
create index if not exists B_eq_timestamp on public.B using btree ("EquipmentId", "Timestamp");

表格:

-- contains 332,000 rows
CREATE TABLE A (
    "EquipmentId" bigint,
    "Timestamp" timestamp without time zone,
    "ValidUntil" timestamp without time zone
)
WITH ( OIDS = FALSE )

-- contains 70,000,000 rows
CREATE TABLE B
(
    "EquipmentId" bigint,
    "Timestamp" timestamp without time zone,
    "PropertyValue" double precision
)
WITH ( OIDS = FALSE )

执行计划(说明...输出):

Nested Loop  (cost=176853.59..59023908.95 rows=941684055 width=48)
  ->  Bitmap Heap Scan on v2_pressure p  (cost=176853.16..805789.35 rows=9448335 width=24)
        Recheck Cond: ("EquipmentId" = 2956235)
        ->  Bitmap Index Scan on v2_pressure_eq  (cost=0.00..174491.08 rows=9448335 width=0)
              Index Cond: ("EquipmentId" = 2956235)"
  ->  Index Scan using v2_prs_eq_timestamp_validuntil on v2_prs prs  (cost=0.42..5.16 rows=100 width=32)
        Index Cond: (("EquipmentId" = 2956235) AND (p."Timestamp" >= "Timestamp") AND (p."Timestamp" < "ValidUntil"))

更新1: 根据评论修复了索引,大大提高了性能

2 个答案:

答案 0 :(得分:1)

索引校正是解决速度缓慢的第一方法,但只能在一定程度上有所帮助。鉴于您的表很大,我建议您尝试使用Postgres Partition。它具有来自postgres的一些内置支持。

但是您需要具有一些过滤器/分区标准。我没有在查询中看到任何where子句,因此无法建议。也许您可以尝试equipmentId。这也可以帮助实现并行性。

答案 1 :(得分:0)

-- \i tmp.sql

CREATE TABLE A
        ( equipmentid bigint NOT NULL
        , ztimestamp timestamp without time zone NOT NULL
        , validuntil timestamp without time zone NOT NULL
        , PRIMARY KEY (equipmentid,ztimestamp)
        , UNIQUE (equipmentid,validuntil) -- mustbeunique, since the intervals dont overlap
        ) ;

-- contains 70,000,000 rows
CREATE TABLE B
        ( equipmentid bigint NOT NULL
        , ztimestamp timestamp without time zone NOT NULL
        , propertyvalue double precision
        , PRIMARY KEY (equipmentid,ztimestamp)
        ) ;

INSERT INTO B(equipmentid,ztimestamp,propertyvalue)
SELECT i,t, random()
FROM generate_series(1,1000) i
CROSS JOIN generate_series('2018-09-01','2018-09-30','1day'::interval) t
        ;


INSERT INTO A(equipmentid,ztimestamp,validuntil)
SELECT equipmentid,ztimestamp, ztimestamp+ '7 days'::interval
FROM B
WHERE date_part('dow', ztimestamp) =0
        ;

ANALYZE A;
ANALYZE B;

EXPLAIN
SELECT
    A.*,
    B.ztimestamp AS pressuretimestamp,
    B.propertyvalue AS pressure
FROM A
INNER JOIN B
    ON  B.equipmentid =  A.equipmentid
    AND B.ztimestamp   >= A.ztimestamp
    AND B.ztimestamp   <  A.validuntil
    WHERE A.equipmentid=333 -- I added this, the plan in the question also has a r estriction on Id
        ;

以及最终的计划:


SET
ANALYZE
ANALYZE
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.34..21.26 rows=17 width=40)
   ->  Index Scan using a_equipmentid_validuntil_key on a  (cost=0.17..4.34 rows=5 width=24)
         Index Cond: (equipmentid = 333)
   ->  Index Scan using b_pkey on b  (cost=0.17..3.37 rows=3 width=24)
         Index Cond: ((equipmentid = 333) AND (ztimestamp >= a.ztimestamp) AND (ztimestamp < a.validuntil))
(5 rowSET

这与我当前的设置random_page_cost=1.1;

将其设置为4.0后,我将获得与OP相同的计划:


SET
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=35.13..54561.69 rows=1416136 width=40) (actual time=1.391..1862.275 rows=225540 loops=1)
   ->  Bitmap Heap Scan on aa2  (cost=34.71..223.52 rows=1345 width=24) (actual time=1.173..5.223 rows=1345 loops=1)
         Recheck Cond: (equipmentid = 5)
         Heap Blocks: exact=9
         ->  Bitmap Index Scan on aa2_equipmentid_validuntil_key  (cost=0.00..34.38 rows=1345 width=0) (actual time=1.047..1.048 rows=1345 loops=1)
               Index Cond: (equipmentid = 5)
   ->  Index Scan using bb2_pkey on bb2  (cost=0.42..29.87 rows=1053 width=24) (actual time=0.109..0.757 rows=168 loops=1345)
         Index Cond: ((equipmentid = 5) AND (ztimestamp >= aa2.ztimestamp) AND (ztimestamp < aa2.validuntil))
 Planning Time: 3.167 ms
 Execution Time: 2168.967 ms
(10 rows)