Postgres:查询“ WHERE ID IN(...)”的优化

时间:2019-02-11 17:13:04

标签: postgresql query-optimization postgresql-10

我有一张表(超过2M条记录),可以跟踪分类帐。 一些条目加点,而另一些条目减去点(只有两种条目)。减去点的条目始终引用用referenceentryid从中减去的(相加)条目。添加的条目将始终在NULL中包含referenceentryid

此表具有dead列,当某些添加物耗尽或过期时,或当减法指向“无效”添加时,工作人员会将其设置为true。由于该表在dead=false上具有部分索引,因此对活动行进行SELECT的速度非常快。

我的问题在于将dead设置为NULL的工作人员的绩效。

流为: 1.为每个添加项获得一个条目,该条目指示添加,减去的数量以及该数量是否过期。 2.过滤掉既未过期又具有除减法之外的更多内容的条目。 3.在过滤的一组条目中的dead=trueid的每一行上更新referenceentryid

WITH entries AS 
(
    SELECT 
        additions.id AS id,
        SUM(subtractions.amount) AS subtraction,
        additions.amount AS addition,
        additions.expirydate <= now() AS expired
    FROM 
        loyalty_ledger AS subtractions
    INNER JOIN 
        loyalty_ledger AS additions
    ON 
        additions.id = subtractions.referenceentryid
    WHERE
        subtractions.dead = FALSE
        AND subtractions.referenceentryid IS NOT NULL
    GROUP BY 
        subtractions.referenceentryid, additions.id
), dead_entries AS (
    SELECT
        id
    FROM
        entries
    WHERE
        subtraction >= addition OR expired = TRUE
)
-- THE SLOW BIT:
SELECT
    *
FROM 
    loyalty_ledger AS ledger
WHERE
    ledger.dead = FALSE AND
    (ledger.id IN (SELECT id FROM dead_entries) OR ledger.referenceentryid IN (SELECT id FROM dead_entries));

在上面的查询中,内部运行非常快(几秒钟),而最后一部分将永远运行。

我在桌子上有以下索引:

CREATE TABLE IF NOT EXISTS loyalty_ledger (
        id SERIAL PRIMARY KEY,
        programid bigint NOT NULL,   
        FOREIGN KEY (programid) REFERENCES loyalty_programs(id) ON DELETE CASCADE,
        referenceentryid    bigint,
        FOREIGN KEY (referenceentryid) REFERENCES loyalty_ledger(id) ON DELETE CASCADE,
        customerprofileid bigint NOT NULL,
        FOREIGN KEY (customerprofileid) REFERENCES customer_profiles(id) ON DELETE CASCADE,
        amount int NOT NULL,
        expirydate TIMESTAMPTZ,
        dead boolean DEFAULT false,
        expired boolean DEFAULT false
);

CREATE index loyalty_ledger_referenceentryid_idx ON loyalty_ledger (referenceprofileid) WHERE dead = false;
CREATE index loyalty_ledger_customer_program_idx ON loyalty_ledger (customerprofileid, programid) WHERE dead = false;

我正在尝试优化查询的最后一部分。 EXPLAIN给了我以下内容:

"Index Scan using loyalty_ledger_referenceentryid_idx on loyalty_ledger ledger  (cost=103412.24..4976040812.22 rows=986583 width=67)"
"  Filter: ((SubPlan 3) OR (SubPlan 4))"
"  CTE entries"
"    ->  GroupAggregate  (cost=1.47..97737.83 rows=252177 width=25)"
"          Group Key: subtractions.referenceentryid, additions.id"
"          ->  Merge Join  (cost=1.47..91390.72 rows=341928 width=28)"
"                Merge Cond: (subtractions.referenceentryid = additions.id)"
"                ->  Index Scan using loyalty_ledger_referenceentryid_idx on loyalty_ledger subtractions  (cost=0.43..22392.56 rows=341928 width=12)"
"                      Index Cond: (referenceentryid IS NOT NULL)"
"                ->  Index Scan using loyalty_ledger_pkey on loyalty_ledger additions  (cost=0.43..80251.72 rows=1683086 width=16)"
"  CTE dead_entries"
"    ->  CTE Scan on entries  (cost=0.00..5673.98 rows=168118 width=4)"
"          Filter: ((subtraction >= addition) OR expired)"
"  SubPlan 3"
"    ->  CTE Scan on dead_entries  (cost=0.00..3362.36 rows=168118 width=4)"
"  SubPlan 4"
"    ->  CTE Scan on dead_entries dead_entries_1  (cost=0.00..3362.36 rows=168118 width=4)"

似乎查询的最后一部分效率很低。关于如何加快速度的任何想法?

3 个答案:

答案 0 :(得分:1)

对于大型数据集,我发现半联接的性能要优于查询列表:

from
  loyalty_ledger as ledger
WHERE
    ledger.dead = FALSE AND (
    exists (
      select null
      from dead_entries d
      where d.id = ledger.id
      ) or
    exists (
      select null
      from dead_entries d
      where d.id = ledger.referenceentryid
      )
    )

老实说,我不知道,但是我认为这些都值得一试。它的代码更少,更直观,但是不能保证它们会更好地工作:

ledger.dead = FALSE AND
exists (
  select null
  from dead_entries d
  where d.id = ledger.id or d.id = ledger.referenceentryid 
)

ledger.dead = FALSE AND
exists (
  select null
  from dead_entries d
  where d.id in (ledger.id, ledger.referenceentryid) 
)

答案 1 :(得分:0)

最终帮助我的是在第二个id IN步骤中进行了WITH过滤,将IN替换为ANY语法:

   WITH entries AS 
        (
            SELECT 
                additions.id AS id,
                additions.amount - coalesce(SUM(subtractions.amount),0) AS balance,
                additions.expirydate <= now() AS passed_expiration
            FROM 
                loyalty_ledger AS additions
            LEFT JOIN 
                loyalty_ledger AS subtractions
            ON 
                subtractions.dead = FALSE AND
                additions.id = subtractions.referenceentryid
            WHERE
                additions.dead = FALSE AND additions.referenceentryid IS NULL
            GROUP BY 
                subtractions.referenceentryid, additions.id
        ), dead_rows AS (
            SELECT
                l.id AS id,
                -- only additions that still have usable points can expire
                l.referenceentryid IS NULL AND e.balance > 0 AND e.passed_expiration AS expired
            FROM
                loyalty_ledger AS l
            INNER JOIN
                entries AS e
            ON
                (l.id = e.id OR l.referenceentryid = e.id)
            WHERE
                l.dead = FALSE AND
                (e.balance <= 0 OR e.passed_expiration)
           ORDER BY e.balance DESC
        )
        UPDATE
            loyalty_ledger AS l
        SET 
            (dead, expired) = (TRUE, d.expired)
        FROM 
            dead_rows AS d
        WHERE
            l.id = d.id AND
            l.dead = FALSE;

答案 2 :(得分:0)

我也相信

-- THE SLOW BIT:
SELECT
    *
FROM 
    loyalty_ledger AS ledger
WHERE
    ledger.dead = FALSE AND
    (ledger.id IN (SELECT id FROM dead_entries) OR ledger.referenceentryid IN (SELECT id FROM dead_entries));

可以重写为JOINUNION ALL,这很可能还会生成其他执行计划,并且速度可能更快。
但是如果没有其他表结构,很难确定。

SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT id FROM dead_entries) AS dead_entries
ON ledger.id = dead_entries.id AND ledger.dead = FALSE

UNION ALL 

SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT id FROM dead_entries) AS dead_entries
ON ledger.referenceentryid = dead_entries.id AND ledger.dead = FALSE

并且因为PostgreSQL中的CTE已实现且未建立索引。您最好从CTE中删除dead_entries别名,然后在CTE外部重复。

 SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT
    id
FROM
    entries
WHERE
    subtraction >= addition OR expired = TRUE) AS dead_entries
ON ledger.id = dead_entries.id AND ledger.dead = FALSE

UNION ALL 

SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT
    id
FROM
    entries
WHERE
    subtraction >= addition OR expired = TRUE) AS dead_entries
ON ledger.referenceentryid = dead_entries.id AND ledger.dead = FALSE