高效查询分区主PostgreSQL表

时间:2013-04-01 15:19:26

标签: performance postgresql partitioning

我在谈论this feature

我有主表:

logstore=# \d history_log
                                   Table "public.history_log"
  Column   |           Type           |                       
-----------+--------------------------+-----------------------------------------------------------
 id        | bigint                   | NOT NULL DEFAULT nextval('history_log__id_seq'::regclass)
 tstamp    | timestamp with time zone | NOT NULL DEFAULT now()
 session   | character varying(40)    |
 action    | smallint                 | NOT NULL
 userid    | integer                  |
 urlid     | integer                  |
Indices:
    "history_log__id_pkey" PRIMARY KEY, btree (id)
Triggers:
    insert_history_log_trigger BEFORE INSERT ON history_log FOR EACH ROW EXECUTE PROCEDURE history_log_insert_trigger()

和一组由tstamp列分区的子表:

logstore=# \d history_log_201304
                               Table "public.history_log_201304"
  Column   |           Type           |                       
-----------+--------------------------+-----------------------------------------------------------
 id        | bigint                   | NOT NULL DEFAULT nextval('history_log__id_seq'::regclass)
 tstamp    | timestamp with time zone | NOT NULL DEFAULT now()
 session   | character varying(40)    |
 action    | smallint                 | NOT NULL
 userid    | integer                  |
 urlid     | integer                  |
Indices:
    "history_log_201304_pkey" PRIMARY KEY, btree (id)
    "history_log_201304_tstamp" btree (tstamp)
    "history_log_201304_userid" btree (userid)
Constraints:
    "history_log_201304_tstamp_check" CHECK (tstamp >= '2013-04-01 00:00:00+04'::timestamp with time zone AND tstamp < '2013-05-01 00:00:00+04'::timestamp with time zone)
Inherits: history_log

那么我的问题是什么 - 当我在子表上直接执行由tstamp约束的WHERE条件的查询时 - 它的工作速度非常快。

logstore=# EXPLAIN SELECT userid FROM history_log_201304 WHERE tstamp >= (current_date - interval '3 days')::date::timestamptz AND tstamp < current_date::timestamptz AND action = 13;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Index Scan using history_log_201304_tstamp on history_log_201304  (cost=0.01..8.37 rows=1 width=4)
   Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
   Filter: (action = 13)

但是当我尝试在主表上执行相同操作时 - 它会转到Seq Scan:

logstore=# EXPLAIN SELECT userid FROM history_log WHERE tstamp >= (current_date - interval '3 days')::date::timestamptz AND tstamp < current_date::timestamptz AND action = 13;
                                                                    QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
---------------
 Result  (cost=0.00..253099.82 rows=1353838 width=4)
   ->  Append  (cost=0.00..253099.82 rows=1353838 width=4)
         ->  Seq Scan on history_log  (cost=0.00..0.00 rows=1 width=4)
               Filter: ((action = 13) AND (tstamp < ('now'::cstring)::date) AND (tstamp >= ((('now'::cstring)::date - '3 days'::inte
rval))::date))
         ->  Index Scan using history_log_201203_tstamp on history_log_201203 history_log  (cost=0.01..9.67 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201204_tstamp on history_log_201204 history_log  (cost=0.01..9.85 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201205_tstamp on history_log_201205 history_log  (cost=0.01..10.39 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201206_tstamp on history_log_201206 history_log  (cost=0.01..10.32 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201207_tstamp on history_log_201207 history_log  (cost=0.01..10.09 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201208_tstamp on history_log_201208 history_log  (cost=0.01..10.35 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201209_tstamp on history_log_201209 history_log  (cost=0.01..10.53 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201210_tstamp on history_log_201210 history_log  (cost=0.01..11.83 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201211_tstamp on history_log_201211 history_log  (cost=0.01..11.87 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201212_tstamp on history_log_201212 history_log  (cost=0.01..12.40 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201301_tstamp on history_log_201301 history_log  (cost=0.01..12.35 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201302_tstamp on history_log_201302 history_log  (cost=0.01..12.35 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201303_tstamp on history_log_201303 history_log  (cost=0.01..252959.45 rows=1353824 width=
4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)
         ->  Index Scan using history_log_201304_tstamp on history_log_201304 history_log  (cost=0.01..8.37 rows=1 width=4)
               Index Cond: ((tstamp >= ((('now'::cstring)::date - '3 days'::interval))::date) AND (tstamp < ('now'::cstring)::date))
               Filter: (action = 13)

这里发生了什么?为什么对主表的查询不是那么快?

我将constraint_exclusion设置为on

编辑:为了便于阅读,我偶然找到了解决方案并将其写在这里。

直到今天我有错误的约束 - 我的tstamp列属于timestamp WITH time zone类型,约束是基于timestamp WITHOUT time zone构建的。我修复了这一点,修复了我的查询以进行类型转换 - 但仍然对主表的查询花费了几分钟而不是秒。这是我的最后一个选择,所以我去了SO。在对话过程中,我去了DB并向所有子表发出EXPLAIN ANALYZE以获得一些实际数字 - 然后在主表上的查询变得很快!

1 个答案:

答案 0 :(得分:1)

查询应该同样快。 seq扫描正在主表上执行,在给定正确配置的分区表的情况下,它应该包含根本没有行

考虑使用EXPLAIN ANALYZE,以便您可以确切了解查询的持续时间。两者之间的差异应该可以忽略不计。


实际问题似乎是在不返回任何结果的子表上执行查询。大概你的问题归结为:为什么仍然无法满足CHECK约束的子表仍然被搜索?

关于这个问题有a thread on the pgsql-bugs mailing list。您的tstamp列是timestamp with time zone。由于WHERE子句中的表达式是date值而不是时间戳,因此无法使用该检查。请考虑使用CURRENT_TIMESTAMP代替CURRENT_DATE。如果您需要从午夜开始查询,请保留当前查询,但添加强制转换为与tstamp列具有完全相同的类型(::timestamp with time zone)。