多列索引的Postgres查询执行时间慢

时间:2019-05-13 04:24:11

标签: postgresql postgresql-performance

我们正在Amazon RDS上运行PostgresSql 9.6.11数据库。其中一个查询的执行时间为6633.645毫秒。这似乎很慢。我可以进行哪些更改以缩短此查询的执行时间。

查询正在选择3列,其中数据与6列匹配。

select
    platform,
    publisher_platform,
    adset_id
FROM "adsets"
WHERE
    (("adsets"."account_id" IN ('1595321963838425', '1320001405', 'urn:li:sponsoredAccount:507697540')) AND
    ("adsets"."date" >= '2019-05-06 00:00:00.000000+0000') AND ("adsets"."date" <= '2019-05-13 23:59:59.999999+0000'))
GROUP BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id"
ORDER BY
    "adsets"."platform",
    "adsets"."publisher_platform",
    "adsets"."adset_id";

查询基于称为adset表的表。该表格包括以下列

account_id |文字
   campaign_id |文字
   adset_id |文字
   名称|文字
   日期|没有时区的时间戳    Publisher_platform |文字

和其他15列,它们是整数和文本字段的组合。

我们添加了以下索引-

  1. “ adsets_composite_unique_key”唯一约束,btree(平台,account_id,campaign_id,adset_id,日期,publisher_platform)
  2. “ adsets_account_id_date_idx” btree(account_id DESC,日期DESC)集群
  3. “ adsets_account_id_index” btree(account_id)
  4. “ adsets_adset_id_index” btree(adset_id)
  5. “ adsets_campaign_id_index” btree(campaign_id)
  6. “ adsets_name_index” btree(名称)
  7. “ adsets_platform_platform_id_publisher_platform” btree(帐户ID,平台,publisher_platform,adset_id)
  8. “ idx_account_date_adsets” btree(帐户ID,日期)
  9. “ platform_pub_index” btree(平台,publisher_platform,adset_id)。

postgres上的work_mem已设置为125MB

解释(分析)显示

   Group  (cost=33447.55..33532.22 rows=8437 width=29) (actual time=6625.170..6633.062 rows=2807 loops=1)
   Group Key: platform, publisher_platform, adset_id
   ->  Sort  (cost=33447.55..33468.72 rows=8467 width=29) (actual time=6625.168..6629.271 rows=22331 loops=1)
         Sort Key: platform, publisher_platform, adset_id
         Sort Method: quicksort  Memory: 2513kB
         ->  Bitmap Heap Scan on adsets  (cost=433.63..32895.18 rows=8467 width=29) (actual time=40.003..6471.898 rows=22331 loops=1)
               Recheck Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date <= '
2019-05-13 23:59:59.999999'::timestamp without time zone))
               Heap Blocks: exact=52907
               ->  Bitmap Index Scan on idx_account_date_adsets  (cost=0.00..431.51 rows=8467 width=0) (actual time=27.335..27.335 rows=75102 loops=1)
                     Index Cond: ((account_id = ANY ('{1595321963838425,1320001405,urn:li:sponsoredAccount:507697540}'::text[])) AND (date >= '2019-05-06 00:00:00'::timestamp without time zone) AND (date
<= '2019-05-13 23:59:59.999999'::timestamp without time zone))
 Planning time: 5.380 ms
 Execution time: 6633.645 ms
(12 rows)

Explain depesz

2 个答案:

答案 0 :(得分:1)

首先,您在使用GROUP BY时并未实际选择任何聚合。您也可以在查询中进行SELECT DISTINCT。除此之外,这是您可能应该使用的B树索引:

CREATE INDEX idx ON adsets (account_id, date, platform, publisher_platform,
    adset_id);

当前索引的问题是,尽管它确实覆盖了您选择的列,但它不涉及WHERE子句中出现的列。这意味着Postgres可能甚至选择不使用索引,而只扫描整个表。

请注意,我的建议仍然无法处理查询中所选择的不同部分,但至少它可以加快查询那部分之前的所有操作。

这是您更新的查询:

SELECT DISTINCT
    platform,
    publisher_platform,
    adset_id
FROM adsets
WHERE
    account_id IN ('1595321963838425', '1320001405',
                   'urn:li:sponsoredAccount:507697540') AND
    date >= '2019-05-06' AND date < '2019-05-14';

答案 1 :(得分:0)

您的问题是在位图索引扫描阶段发现并在堆扫描阶段消除的许多“误报”。由于没有额外的过滤器,因此我猜必须删除多余的行,因为它们不可见。

查看VACUUM adsets是否会提高查询性能。