使用“ OR”将单个表中的两列连接起来太慢

时间:2019-05-08 04:55:03

标签: sql postgresql

我正在尝试获取上个月最后一天到当月最后一天(时间为'15:00:00')的总交易次数和金额。

表结构

CREATE TABLE mjr_agent
(
    id bigint NOT NULL UNIQUE,
    head_agent_id varchar(20),
    code varchar(20) NOT NULL UNIQUE,
    password varchar(200) NOT NULL,
    agent_type smallint NOT NULL,
    agent_group smallint NOT NULL,
    division varchar(100) NOT NULL,
    township varchar(200) NOT NULL,
    shop_name varchar(200),
    contact_name varchar(150) NOT NULL,
    ph_no varchar(50),
    contact_ph_no varchar(50),
    address varchar(500),
    record_reg_id bigint NOT NULL,
    record_upd_id bigint NOT NULL,
    record_reg_date timestamp DEFAULT current_timestamp NOT NULL,
    record_upd_date timestamp DEFAULT current_timestamp NOT NULL,
    user_id bigint,
    PRIMARY KEY (id)
) WITHOUT OIDS;

CREATE TABLE mjr_transaction_item
(
    id bigint NOT NULL UNIQUE,
    exl_id bigint NOT NULL,
    sr_no int NOT NULL,
    txn_id varchar(20) NOT NULL,
    txn_date timestamp NOT NULL,
    txn_status smallint NOT NULL,
    src_id varchar(20) NOT NULL,
    src_type smallint NOT NULL,
    src_name varchar(100),
    src_ocpt varchar(100),
    dest_id varchar(20) NOT NULL,
    dest_type smallint NOT NULL,
    dest_name varchar(100),
    desc_ocpt varchar(100),
    amount numeric,
    reason varchar(250),
    medium smallint NOT NULL,
    record_reg_id bigint NOT NULL,
    record_upd_id bigint NOT NULL,
    record_reg_date timestamp DEFAULT current_timestamp NOT NULL,
    record_upd_date timestamp DEFAULT current_timestamp NOT NULL,
    PRIMARY KEY (id)
) WITHOUT OIDS;

每个表中的总记录

mjr_agent = 700和mjr_transaction_item = 1136043

查询

SELECT
    agnt.ID AS agent_id,
    COUNT ( DISTINCT txi.ID ) AS total_transaction,
    SUM ( txi.amount ) AS total_amount
FROM
    mjr_agent agnt
    LEFT JOIN mjr_transaction_item txi ON agnt.code = txi.src_id 
    OR agnt.code = txi.dest_id
WHERE
    CONCAT (( date_trunc( 'month', to_timestamp( '2019-04-01', 'YYYY-MM-DD' )) + INTERVAL '1 month' - INTERVAL '1 day' ) :: DATE, ' 15:00:00' ) :: TIMESTAMP >= txi.txn_date 
    AND txi.txn_date >= CONCAT (( date_trunc( 'month', to_timestamp( '2019-04-01', 'YYYY-MM-DD' )) :: DATE - 1 ), ' 15:00:00' ) :: TIMESTAMP 
GROUP BY
    agnt.ID 

查询计划

GroupAggregate  (cost=1000.27..52530.69 rows=86 width=48) (actual time=779.221..14716.528 rows=400 loops=1)
  Group Key: agnt.id
  Buffers: shared hit=10729
  ->  Nested Loop  (cost=1000.27..52523.53 rows=812 width=48) (actual time=759.898..14708.292 rows=10132 loops=1)
        Join Filter: (((agnt.code)::text = (txi.src_id)::text) OR ((agnt.code)::text = (txi.dest_id)::text))
        Rows Removed by Join Filter: 65321568
        Buffers: shared hit=10729
        ->  Index Scan using mjr_agent_pkey on mjr_agent agnt  (cost=0.27..189.56 rows=86 width=66) (actual time=0.045..1.827 rows=700 loops=1)
              Buffers: shared hit=92
        ->  Materialize  (cost=1000.00..50911.10 rows=947 width=156) (actual time=0.001..5.658 rows=93331 loops=700)
              Buffers: shared hit=10637
              ->  Gather  (cost=1000.00..50906.37 rows=947 width=156) (actual time=0.809..616.836 rows=93331 loops=1)
                    Workers Planned: 4
                    Workers Launched: 4
                    Buffers: shared hit=10637
                    ->  Parallel Seq Scan on mjr_transaction_item txi  (cost=0.00..49811.67 rows=237 width=156) (actual time=0.150..625.545 rows=18666 loops=5)
                          Filter: ((txn_date >= (concat(((date_trunc('month'::text, to_timestamp('2019-04-01'::text, 'YYYY-MM-DD'::text)))::date - 1), ' 15:00:00'))::timestamp without time zone) AND ((concat((((date_trunc('month'::text, to_timestamp('2019-04-01'::text, 'YYYY-MM-DD'::text)) + '1 mon'::interval) - '1 day'::interval))::date, ' 15:00:00'))::timestamp without time zone >= txn_date))
                          Rows Removed by Filter: 208542
                          Buffers: shared hit=47351
Planning time: 0.582 ms
Execution time: 14718.609 ms

硬件规格

Cent OS 7.6,内存32 GB,8 CPU,Postgres 10.0

平均查询执行时间约为 12 秒,这对于我的应用程序是不可接受的。我没有太多的数据库调优经验,而且我也不擅长编写SQL。因此,我要求经验丰富的开发人员提供所有建议。任何建议将不胜感激。

1 个答案:

答案 0 :(得分:1)

一个带有OR条件的(左)JOIN通常没有很好地优化。但是您可以做的是在两列的array上创建一个索引,然后将连接条件更改为使用array operator

首先创建索引:

create index on mjr_transaction_item using gin ( (array[src_id, dest_id]) );

然后更改JOIN条件:

...
FROM mjr_agent agnt
  LEFT JOIN mjr_transaction_item txi ON array[agnt.code] <@ array[txi.src_id, txi.dest_id] 
WHERE ...

您也可以使用ANY条件,但我认为不会使用索引:

...
FROM mjr_agent agnt
  LEFT JOIN mjr_transaction_item txi ON agnt.code = ANY(array[txi.src_id, txi.dest_id]) 
WHERE ...