我有一个批处理过程,为每个near_link
位置找到avl
。 avl分布是随机的,但在城市周围有正态分布。问题是第一批需要花费很多时间。但是后批更快。
地图没有变化,所以我的猜测是创建了一些统计数据。因为尝试一遍又一遍地在同一个地图中搜索x,y。
那么如何在批量启动之前帮助创建这些统计数据?或者我如何检查引擎盖后面发生了什么。
问题是我得到这个结果单独运行批处理,我担心如果在生产服务器中运行统计数据不是那么好,因为它是对地图的所有其他请求。
-- Executing query:
SELECT * FROM avl_db.process_near_link();
NOTICE: Duration in seconds= 163.4609 , Rows= 400
NOTICE: Duration in seconds= 68.73396 , Rows= 400
NOTICE: Duration in seconds= 36.93196 , Rows= 400
NOTICE: Duration in seconds= 17.58829 , Rows= 400
NOTICE: Duration in seconds= 12.94885 , Rows= 400
NOTICE: Duration in seconds= 9.509757 , Rows= 400
Total query runtime: 05:09 minutes -- 2400 rows
1 row retrieved.
-- Executing query:
SELECT * FROM avl_db.process_near_link();
NOTICE: Duration in seconds= 8.03767 , Rows= 400
NOTICE: Duration in seconds= 8.51031 , Rows= 400
NOTICE: Duration in seconds= 5.45953 , Rows= 400
NOTICE: Duration in seconds= 4.08547 , Rows= 400
NOTICE: Duration in seconds= 4.19483 , Rows= 400
NOTICE: Duration in seconds= 3.85986 , Rows= 400
Total query runtime: 34.1 secs -- 2400 rows
1 row retrieved.
-- Executing query:
SELECT * FROM avl_db.process_near_link();
NOTICE: Duration in seconds= 3.66540 , Rows= 400
NOTICE: Duration in seconds= 3.55134 , Rows= 400
NOTICE: Duration in seconds= 3.17400 , Rows= 400
NOTICE: Duration in seconds= 3.06982 , Rows= 400
NOTICE: Duration in seconds= 2.96954 , Rows= 400
NOTICE: Duration in seconds= 3.05310 , Rows= 400
NOTICE: Duration in seconds= 2.88948 , Rows= 400
NOTICE: Duration in seconds= 2.77269 , Rows= 400
NOTICE: Duration in seconds= 2.88940 , Rows= 400
NOTICE: Duration in seconds= 2.94150 , Rows= 400
NOTICE: Duration in seconds= 2.84522 , Rows= 400
NOTICE: Duration in seconds= 2.86770 , Rows= 400
NOTICE: Duration in seconds= 2.74608 , Rows= 400
Total query runtime: 39.4 secs -- 5200
1 row retrieved.
这是批量查询:
UPDATE avl_db.avl_pool a
SET near_link = map.get_near_link(sq.X, sq.Y, sq.AZIMUTH),
has_link = true
FROM (
SELECT avl_id, x, y, azimuth
FROM avl_db.avl_pool
WHERE NOT has_link
ORDER BY avl_id
LIMIT 400
) sq
WHERE a.avl_id = sq.avl_id;
"Update on avl_pool a (cost=0.84..3395.28 rows=400 width=151) (actual time=2779.889..2779.889 rows=0 loops=1)"
" -> Nested Loop (cost=0.84..3395.28 rows=400 width=151) (actual time=11.253..2738.711 rows=400 loops=1)"
" -> Subquery Scan on sq (cost=0.42..34.28 rows=400 width=80) (actual time=6.882..8.496 rows=400 loops=1)"
" -> Limit (cost=0.42..30.28 rows=400 width=28) (actual time=6.871..7.964 rows=400 loops=1)"
" -> Index Scan using avl_pool_pkey on avl_pool (cost=0.42..29185.30 rows=391017 width=28) (actual time=6.869..7.873 rows=400 loops=1)"
" Filter: (NOT has_link)"
" Rows Removed by Filter: 10800"
" -> Index Scan using avl_pool_pkey on avl_pool a (cost=0.42..8.14 rows=1 width=79) (actual time=0.003..0.029 rows=1 loops=400)"
" Index Cond: (avl_id = sq.avl_id)"
"Planning time: 0.372 ms"
"Execution time: 2779.970 ms"
答案 0 :(得分:0)
我会说你正在体验缓存的效果。
在第一次运行期间,必须从磁盘获取数据,以后运行可以从已缓存的数据中获益(主要来自avl_pool_pkey
索引的块,但也包括先前更新期间访问的表块)。
如果您使用EXPLAIN (ANALYZE, BUFFERS)
,则可以验证这一点,fun=dexp
将显示从磁盘读取的块数以及在缓存中找到的块数。