如何正确创建postgres日期索引?

时间:2010-08-25 14:40:47

标签: postgresql django-orm

我正在使用Django ORM和postgresql。

ORM创建一个查询:

SELECT 
    (date_part('month', stat_date)) AS "stat_date", 
    "direct_keywordstat"."banner_id", 
    SUM("direct_keywordstat"."total") AS "total", 
    SUM("direct_keywordstat"."clicks") AS "clicks", 
    SUM("direct_keywordstat"."shows") AS "shows" 

FROM "direct_keywordstat" 

LEFT OUTER JOIN "direct_banner" ON ("direct_keywordstat"."banner_id" = "direct_banner"."banner_ptr_id") 
LEFT OUTER JOIN "platforms_banner" ON ("direct_banner"."banner_ptr_id" = "platforms_banner"."id") 

WHERE (
     "direct_keywordstat".stat_date BETWEEN E'2009-08-25' AND E'2010-08-25' AND
    "direct_keywordstat"."keyword_id" IN (

        SELECT U0."id" 
        FROM "direct_keyword" U0 
        INNER JOIN "direct_banner" U1 ON (U0."banner_id" = U1."banner_ptr_id") 
        INNER JOIN "platforms_banner" U2 ON (U1."banner_ptr_id" = U2."id") 
        INNER JOIN "platforms_campaign" U3 ON (U2."campaign_id" = U3."id")
        INNER JOIN "direct_campaign" U4 ON (U3."id" = U4."campaign_ptr_id")
        WHERE (
            U0."deleted" = E'False' AND 
            U0."low_ctr" = E'False' AND 
            U4."status_active" = E'True' AND 
            U0."banner_id" IN (

                SELECT U0."banner_ptr_id" 
                FROM "direct_banner" U0 
                INNER JOIN "platforms_banner" U1 
                ON (U0."banner_ptr_id" = U1."id") 
                WHERE (
                    U0."status_show" = E'True' AND 
                    U1."campaign_id" = E'174' )
            )
        )
    )
) 

GROUP BY 
    "direct_keywordstat"."banner_id", 
    (date_part('month', stat_date)), 
    "platforms_banner"."title", date_trunc('month', stat_date) 

ORDER BY "platforms_banner"."title" ASC, "stat_date" ASC

问题是,direct_keywordstat包含3mln +记录,因此查询在~15秒内执行。

我尝试过创建像

这样的索引
CREATE INDEX direct_keywordstat_stat_date on direct_keywordstat using btree(stat_date);

但EXPLAIN ANALYZE显示未使用索引。

表架构:

\ d direct_keywordstat

                                   Table "public.direct_keywordstat"
   Column    |          Type          |                            Modifiers                            
-------------+------------------------+-----------------------------------------------------------------
 id          | integer                | not null default nextval('direct_keywordstat_id_seq'::regclass)
 keyword_id  | integer                | not null
 banner_id   | integer                | not null
 campaign_id | integer                | not null
 stat_date   | date                   | not null
 region_id   | integer                | not null
 place_type  | character varying(30)  | 
 place_name  | character varying(100) | 
 clicks      | integer                | not null default 0
 shows       | integer                | not null default 0
 total       | numeric(19,6)          | not null

如何创建有用的索引?

或者,也许,有机会以其他方式优化此查询?

事情是,如果WHERE看起来像

"direct_keywordstat".clicks BETWEEN 10 AND 3000000

查询在0.8秒内执行。

2 个答案:

答案 0 :(得分:1)

您是否在这些列上有索引:

direct_banner.banner_ptr_id
direct_keywordstat.banner_id
direct_keywordstat.stat_date

direct_keywordstat中的两列可以组合在一个索引中,只需检查

即可

这也是一个问题:

  

排序方法:外部合并磁盘:   20600kB

检查work_mem的设置,此查询至少需要20MB。

答案 1 :(得分:0)

                                                                                                                       QUERY PLAN                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=727967.61..847401.71 rows=2514402 width=67) (actual time=22010.522..23408.262 rows=5 loops=1)
   ->  Sort  (cost=727967.61..734253.62 rows=2514402 width=67) (actual time=21742.365..23134.748 rows=198978 loops=1)
         Sort Key: platforms_banner.title, (date_part('month'::text, (direct_keywordstat.stat_date)::timestamp without time zone)), direct_keywordstat.banner_id, (date_trunc('month'::text, (direct_keywordstat.stat_date)::timestamp with time zone))
         Sort Method:  external merge  Disk: 20600kB
         ->  Hash Join  (cost=1034.02..164165.25 rows=2514402 width=67) (actual time=5159.538..14942.441 rows=198978 loops=1)
               Hash Cond: (direct_keywordstat.keyword_id = u0.id)
               ->  Hash Left Join  (cost=365.78..117471.99 rows=2514402 width=71) (actual time=26.672..13101.294 rows=2523151 loops=1)
                     Hash Cond: (direct_keywordstat.banner_id = direct_banner.banner_ptr_id)
                     ->  Seq Scan on direct_keywordstat  (cost=0.00..76247.17 rows=2514402 width=25) (actual time=8.892..9386.010 rows=2523151 loops=1)
                           Filter: ((stat_date >= '2009-08-25'::date) AND (stat_date <= '2010-08-25'::date))
                     ->  Hash  (cost=324.86..324.86 rows=3274 width=50) (actual time=17.754..17.754 rows=2851 loops=1)
                           ->  Hash Left Join  (cost=209.15..324.86 rows=3274 width=50) (actual time=10.845..15.385 rows=2851 loops=1)
                                 Hash Cond: (direct_banner.banner_ptr_id = platforms_banner.id)
                                 ->  Seq Scan on direct_banner  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.004..1.196 rows=2851 loops=1)
                                 ->  Hash  (cost=173.51..173.51 rows=2851 width=50) (actual time=10.683..10.683 rows=2851 loops=1)
                                       ->  Seq Scan on platforms_banner  (cost=0.00..173.51 rows=2851 width=50) (actual time=0.004..3.576 rows=2851 loops=1)
               ->  Hash  (cost=641.44..641.44 rows=2144 width=4) (actual time=30.420..30.420 rows=106 loops=1)
                     ->  HashAggregate  (cost=620.00..641.44 rows=2144 width=4) (actual time=30.162..30.288 rows=106 loops=1)
                           ->  Hash Join  (cost=407.17..614.64 rows=2144 width=4) (actual time=16.152..30.031 rows=106 loops=1)
                                 Hash Cond: (u0.banner_id = u1.banner_ptr_id)
                                 ->  Nested Loop  (cost=76.80..238.50 rows=6488 width=16) (actual time=8.670..22.343 rows=106 loops=1)
                                       ->  HashAggregate  (cost=76.80..76.87 rows=7 width=8) (actual time=0.045..0.047 rows=1 loops=1)
                                             ->  Nested Loop  (cost=0.00..76.79 rows=7 width=8) (actual time=0.033..0.036 rows=1 loops=1)
                                                   ->  Index Scan using platforms_banner_campaign_id on platforms_banner u1  (cost=0.00..22.82 rows=7 width=4) (actual time=0.019..0.020 rows=1 loops=1)
                                                         Index Cond: (campaign_id = 174)
                                                   ->  Index Scan using direct_banner_pkey on direct_banner u0  (cost=0.00..7.70 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
                                                         Index Cond: (u0.banner_ptr_id = u1.id)
                                                         Filter: u0.status_show
                                       ->  Index Scan using direct_keyword_banner_id on direct_keyword u0  (cost=0.00..23.03 rows=5 width=8) (actual time=8.620..22.127 rows=106 loops=1)
                                             Index Cond: (u0.banner_id = u0.banner_ptr_id)
                                             Filter: ((NOT u0.deleted) AND (NOT u0.low_ctr))
                                 ->  Hash  (cost=316.84..316.84 rows=1082 width=8) (actual time=7.458..7.458 rows=403 loops=1)
                                       ->  Hash Join  (cost=227.00..316.84 rows=1082 width=8) (actual time=3.584..7.149 rows=403 loops=1)
                                             Hash Cond: (u1.banner_ptr_id = u2.id)
                                             ->  Seq Scan on direct_banner u1  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.002..1.570 rows=2851 loops=1)
                                             ->  Hash  (cost=213.48..213.48 rows=1082 width=4) (actual time=3.521..3.521 rows=403 loops=1)
                                                   ->  Hash Join  (cost=23.88..213.48 rows=1082 width=4) (actual time=0.715..3.268 rows=403 loops=1)
                                                         Hash Cond: (u2.campaign_id = u3.id)
                                                         ->  Seq Scan on platforms_banner u2  (cost=0.00..173.51 rows=2851 width=8) (actual time=0.001..1.272 rows=2851 loops=1)
                                                         ->  Hash  (cost=22.95..22.95 rows=74 width=8) (actual time=0.345..0.345 rows=37 loops=1)
                                                               ->  Hash Join  (cost=11.84..22.95 rows=74 width=8) (actual time=0.133..0.320 rows=37 loops=1)
                                                                 Hash Cond: (u3.id = u4.campaign_ptr_id)
                                                                 ->  Seq Scan on platforms_campaign u3  (cost=0.00..8.91 rows=391 width=4) (actual time=0.006..0.098 rows=196 loops=1)
                                                                 ->  Hash  (cost=10.91..10.91 rows=74 width=4) (actual time=0.117..0.117 rows=37 loops=1)
                                                                       ->  Seq Scan on direct_campaign u4  (cost=0.00..10.91 rows=74 width=4) (actual time=0.004..0.097 rows=37 loops=1)
                                                                             Filter: status_active

总运行时间:23436.715 ms (47行)

这是