如何预热/准备表/索引统计?

时间:2016-12-06 19:00:53

标签: sql postgresql indexing postgresql-9.5

我有一个批处理过程,为每个near_link位置找到avl。 avl分布是随机的,但在城市周围有正态分布。问题是第一批需要花费很多时间。但是后批更快。

地图没有变化,所以我的猜测是创建了一些统计数据。因为尝试一遍又一遍地在同一个地图中搜索x,y。

那么如何在批量启动之前帮助创建这些统计数据?或者我如何检查引擎盖后面发生了什么。

问题是我得到这个结果单独运行批处理,我担心如果在生产服务器中运行统计数据不是那么好,因为它是对地图的所有其他请求。

-- Executing query:
SELECT * FROM avl_db.process_near_link();

NOTICE:  Duration in seconds= 163.4609 , Rows= 400
NOTICE:  Duration in seconds= 68.73396 , Rows= 400
NOTICE:  Duration in seconds= 36.93196 , Rows= 400
NOTICE:  Duration in seconds= 17.58829 , Rows= 400
NOTICE:  Duration in seconds= 12.94885 , Rows= 400
NOTICE:  Duration in seconds= 9.509757 , Rows= 400

Total query runtime: 05:09 minutes  -- 2400 rows
1 row retrieved.

-- Executing query:
SELECT * FROM avl_db.process_near_link();

NOTICE:  Duration in seconds= 8.03767 , Rows= 400
NOTICE:  Duration in seconds= 8.51031 , Rows= 400
NOTICE:  Duration in seconds= 5.45953 , Rows= 400
NOTICE:  Duration in seconds= 4.08547 , Rows= 400
NOTICE:  Duration in seconds= 4.19483 , Rows= 400
NOTICE:  Duration in seconds= 3.85986 , Rows= 400

Total query runtime: 34.1 secs -- 2400 rows
1 row retrieved.

-- Executing query:
SELECT * FROM avl_db.process_near_link();

NOTICE:  Duration in seconds= 3.66540 , Rows= 400
NOTICE:  Duration in seconds= 3.55134 , Rows= 400
NOTICE:  Duration in seconds= 3.17400 , Rows= 400
NOTICE:  Duration in seconds= 3.06982 , Rows= 400
NOTICE:  Duration in seconds= 2.96954 , Rows= 400
NOTICE:  Duration in seconds= 3.05310 , Rows= 400
NOTICE:  Duration in seconds= 2.88948 , Rows= 400
NOTICE:  Duration in seconds= 2.77269 , Rows= 400
NOTICE:  Duration in seconds= 2.88940 , Rows= 400
NOTICE:  Duration in seconds= 2.94150 , Rows= 400
NOTICE:  Duration in seconds= 2.84522 , Rows= 400
NOTICE:  Duration in seconds= 2.86770 , Rows= 400
NOTICE:  Duration in seconds= 2.74608 , Rows= 400

Total query runtime: 39.4 secs  -- 5200
1 row retrieved.

这是批量查询:

UPDATE avl_db.avl_pool a
SET near_link = map.get_near_link(sq.X, sq.Y, sq.AZIMUTH),
    has_link = true
FROM (
     SELECT avl_id, x, y, azimuth
     FROM avl_db.avl_pool
     WHERE NOT has_link
     ORDER BY avl_id
     LIMIT 400
    ) sq
    WHERE a.avl_id = sq.avl_id;

Explain Plan

"Update on avl_pool a  (cost=0.84..3395.28 rows=400 width=151) (actual time=2779.889..2779.889 rows=0 loops=1)"
"  ->  Nested Loop  (cost=0.84..3395.28 rows=400 width=151) (actual time=11.253..2738.711 rows=400 loops=1)"
"        ->  Subquery Scan on sq  (cost=0.42..34.28 rows=400 width=80) (actual time=6.882..8.496 rows=400 loops=1)"
"              ->  Limit  (cost=0.42..30.28 rows=400 width=28) (actual time=6.871..7.964 rows=400 loops=1)"
"                    ->  Index Scan using avl_pool_pkey on avl_pool  (cost=0.42..29185.30 rows=391017 width=28) (actual time=6.869..7.873 rows=400 loops=1)"
"                          Filter: (NOT has_link)"
"                          Rows Removed by Filter: 10800"
"        ->  Index Scan using avl_pool_pkey on avl_pool a  (cost=0.42..8.14 rows=1 width=79) (actual time=0.003..0.029 rows=1 loops=400)"
"              Index Cond: (avl_id = sq.avl_id)"
"Planning time: 0.372 ms"
"Execution time: 2779.970 ms"

1 个答案:

答案 0 :(得分:0)

我会说你正在体验缓存的效果。

在第一次运行期间,必须从磁盘获取数据,以后运行可以从已缓存的数据中获益(主要来自avl_pool_pkey索引的块,但也包括先前更新期间访问的表块)。

如果您使用EXPLAIN (ANALYZE, BUFFERS),则可以验证这一点,fun=dexp将显示从磁盘读取的块数以及在缓存中找到的块数。