Question

我的查询运行速度太慢。

select c.vm_name,
       round(sum(bytes_sent)*1.8/power(10,9)) gb_sent,
       round(sum(bytes_received)*1.8/power(10,9)) gb_received
  from groups b, 
       vms c, 
       vm_ip_address_histories d, 
       ip_address_usage_histories e
 where b.group_id = c.group_id
   and c.vm_id = d.vm_id
   and d.ip_address = e.ip_address
   and e.datetime >= firstday()
   and d.allocation_date <= last_day(sysdate()) and (d.deallocation_date is null or d.deallocation_date > last_day(sysdate()))
   and b.customer_id = 29
 group by c.vm_name
 order by 1;

函数sysdate()返回没有时区的当前系统时间戳，last_day()返回表示该月最后一天的时间戳。我创建了这些因为Hibernate不喜欢Postgres的表示法。

问题是规划人员正在进行全表扫描，其中有索引。以下是上述查询的解释计划：

    QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1326387.13..1326391.38 rows=1698 width=24) (actual time=13221.041..13221.042 rows=7 loops=1)
   Sort Key: c.vm_name
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=1326236.61..1326296.04 rows=1698 width=24) (actual time=13221.008..13221.026 rows=7 loops=1)
         Group Key: c.vm_name
         ->  Hash Join  (cost=1309056.97..1325972.10 rows=35268 width=24) (actual time=13131.323..13211.612 rows=13631 loops=1)
               Hash Cond: (d.ip_address = e.ip_address)
               ->  Nested Loop  (cost=2.97..6942.24 rows=79 width=15) (actual time=0.249..56.904 rows=192 loops=1)
                     ->  Hash Join  (cost=2.69..41.02 rows=98 width=16) (actual time=0.066..0.638 rows=61 loops=1)
                           Hash Cond: (c.group_id = b.group_id)
                           ->  Seq Scan on vms c  (cost=0.00..30.98 rows=1698 width=24) (actual time=0.009..0.281 rows=1698 loops=1)
                           ->  Hash  (cost=2.65..2.65 rows=3 width=8) (actual time=0.014..0.014 rows=4 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                 ->  Seq Scan on groups b  (cost=0.00..2.65 rows=3 width=8) (actual time=0.004..0.011 rows=4 loops=1)
                                       Filter: (customer_id = 29)
                                       Rows Removed by Filter: 49
                     ->  Index Scan using xif1vm_ip_address_histories on vm_ip_address_histories d  (cost=0.29..70.34 rows=8 width=15) (actual time=0.011..0.921 rows=3 loops=61)
                           Index Cond: (vm_id = c.vm_id)
                           Filter: ((allocation_date <= last_day(sysdate())) AND ((deallocation_date IS NULL) OR (deallocation_date > last_day(sysdate()))))
                           Rows Removed by Filter: 84
               ->  Hash  (cost=1280129.06..1280129.06 rows=1575435 width=23) (actual time=13130.223..13130.223 rows=203702 loops=1)
                     Buckets: 8192  Batches: 32  Memory Usage: 422kB
                     ->  Seq Scan on ip_address_usage_histories e  (cost=0.00..1280129.06 rows=1575435 width=23) (actual time=0.205..13002.776 rows=203702 loops=1)
                           Filter: (datetime >= firstday())
                           Rows Removed by Filter: 4522813
 Planning time: 0.804 ms
 Execution time: 13221.155 ms
(27 rows)

请注意，规划人员选择在最大的表格ip_address_usage_histories和vm_ip_address_histories上执行非常昂贵的全表扫描。我已经尝试将配置参数enable_seqscan更改为关闭，但这使问题变得更糟，总执行时间达到63秒。

以下是上述表格的描述：

                             Table "ip_address_usage_histories"
           Column            |            Type             | Modifiers 
-----------------------------+-----------------------------+-----------
 ip_address_usage_history_id | bigint                      | not null
 datetime                    | timestamp without time zone | not null
 ip_address                  | inet                        | not null
 bytes_sent                  | bigint                      | not null
 bytes_received              | bigint                      | not null
Indexes:
    "ip_address_usage_histories_pkey" PRIMARY KEY, btree (ip_address_usage_history_id)
    "ip_address_usage_histories_datetime_ip_address_key" UNIQUE CONSTRAINT, btree (datetime, ip_address)
    "uk_mit6tbiu8k62vdae4tmtnwb3f" UNIQUE CONSTRAINT, btree (datetime, ip_address)

                          Table "vm_ip_address_histories"
          Column          |            Type             |                                         Modifiers                                          
--------------------------+-----------------------------+--------------------------------------------------------------------------------------------
 vm_ip_address_history_id | bigint                      | not null default nextval('vm_ip_address_histories_vm_ip_address_history_id_seq'::regclass)
 ip_address               | inet                        | not null
 allocation_date          | timestamp without time zone | not null
 deallocation_date        | timestamp without time zone | 
 vm_id                    | bigint                      | not null
Indexes:
    "vm_ip_address_histories_pkey" PRIMARY KEY, btree (vm_ip_address_history_id)
    "xie1vm_ip_address_histories" btree (replicate_date)
    "xif1vm_ip_address_histories" btree (vm_id)
Foreign-key constraints:
    "vm_ip_address_histories_vm_id_fkey" FOREIGN KEY (vm_id) REFERENCES vms(vm_id) ON DELETE RESTRICT

似乎Postgres没有指导计划者的查询提示。我也尝试了from子句inner join ... on ...语法，但这也没有改进。

更新1

create or replace function firstday() returns timestamp without time zone as $$
begin
   return date_trunc('month',now()::timestamp without time zone)::timestamp without time zone;
end; $$
language plpgsql;

我没有尝试用标准函数替换此函数，因为根据我的知识，Postgres没有返回月份第一天的函数。

Answer 1

问题中嵌入了以下内容，但它作为答案。

将我的所有函数更改为immutable后，查询现在可以在200ms内运行！所有正确的事情都在发生。

                              QUERY PLAN                                                                                                         
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=51865.24..51914.88 rows=1103 width=24) (actual time=178.793..188.223 rows=7 loops=1)
   Group Key: c.vm_name
   ->  Sort  (cost=51865.24..51868.00 rows=1103 width=24) (actual time=178.517..180.541 rows=13823 loops=1)
         Sort Key: c.vm_name
         Sort Method: quicksort  Memory: 1464kB
         ->  Hash Join  (cost=50289.49..51809.50 rows=1103 width=24) (actual time=131.278..155.971 rows=13823 loops=1)
               Hash Cond: (d.ip_address = e.ip_address)
               ->  Nested Loop  (cost=2.97..272.36 rows=23 width=15) (actual time=0.149..2.310 rows=192 loops=1)
                     ->  Hash Join  (cost=2.69..41.02 rows=98 width=16) (actual time=0.046..0.590 rows=61 loops=1)
                           Hash Cond: (c.group_id = b.group_id)
                           ->  Seq Scan on vms c  (cost=0.00..30.98 rows=1698 width=24) (actual time=0.006..0.250 rows=1698 loops=1)
                           ->  Hash  (cost=2.65..2.65 rows=3 width=8) (actual time=0.014..0.014 rows=4 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                 ->  Seq Scan on groups b  (cost=0.00..2.65 rows=3 width=8) (actual time=0.004..0.012 rows=4 loops=1)
                                       Filter: (customer_id = 29)
                                       Rows Removed by Filter: 49
                     ->  Index Scan using xif1vm_ip_address_histories on vm_ip_address_histories d  (cost=0.29..2.34 rows=2 width=15) (actual time=0.002..0.027 rows=3 loops=61)
                           Index Cond: (vm_id = c.vm_id)
                           Filter: ((allocation_date <= '2015-03-31 00:00:00'::timestamp without time zone) AND ((deallocation_date IS NULL) OR (deallocation_date > '2015-03-31 00:00:00'::timestamp without time zone)))
                           Rows Removed by Filter: 84
               ->  Hash  (cost=46621.83..46621.83 rows=199575 width=23) (actual time=130.762..130.762 rows=206266 loops=1)
                     Buckets: 8192  Batches: 4  Memory Usage: 2818kB
                     ->  Bitmap Heap Scan on ip_address_usage_histories e  (cost=4627.14..46621.83 rows=199575 width=23) (actual time=18.335..69.763 rows=206266 loops=1)
                           Recheck Cond: (datetime >= '2015-03-01 00:00:00'::timestamp without time zone)
                           Heap Blocks: exact=3684
                           ->  Bitmap Index Scan on uk_mit6tbiu8k62vdae4tmtnwb3f  (cost=0.00..4577.24 rows=199575 width=0) (actual time=17.797..17.797 rows=206935 loops=1)
                                 Index Cond: (datetime >= '2015-03-01 00:00:00'::timestamp without time zone)
 Planning time: 0.837 ms
 Execution time: 188.301 ms
(29 rows)

我现在看到规划器正在执行这些函数，并使用它们的值插入到where子句中，这会导致使用索引。

PostgreSQL 9.4查询调优

1 个答案: