我有两张桌子:
CREATE TABLE sf.dir_current (
id BIGINT primary key,
volume_id INTEGER NOT NULL,
path VARCHAR NOT NULL
);
CREATE index dir_volid_path_indx on dir_current (volume_id, path);
CREATE TABLE sf.event (
id BIGINT, -- no primary key here!
volume_id INTEGER NOT NULL,
parent_path VARCHAR NOT NULL,
type BIGINT,
depth INTEGER
);
表dir包含大约5000万行,并且在所有行中volume_id = 1.表事件包含~20K行。
我执行以下查询(在PLSQL函数中 - VOL_ID,MIN_ID,MAX_ID等是函数参数):
select dir.id as parent_id, event as event_row
from sf.event as event
left outer join sf.dir_current as dir on dir.volume_id = VOL_ID and parent_path = dir.path
where event.volume_id = VOL_ID
and event.id between MIN_ID and MAX_ID
and (DEPTH_FILTER is null or event.depth = DEPTH_FILTER)
and (TYPE_FILTER is null or event.type = TYPE_FILTER)
order by event.depth;
当dir表中的所有行都有volume_id = 1时,一切正常。使用volume_id = 2(并运行analyze)添加几千行后,此查询需要很长时间。 以下是长时间运行查询的说明:explain.depesz.com
因为它清晰可见,查询计划程序不知道有很多行,volume_id = 2,并且创建的计划远非最佳。
经过一些调试后,我发现analyze没有找到任何包含volume_id = 2的行。我用查询确认了它:
starfish=# SELECT most_common_vals, n_distinct FROM pg_stats WHERE tablename = 'dir_current' and attname = 'volume_id';
most_common_vals | n_distinct
------------------+------------
{1} | 1
(1 row)
经过几次分析之后,它最终找到一些vol_id = 2的值,查询恢复正常执行时间:explain.depesz.com
问题:如何防止极长的查询时间?有没有办法强制分析找到这些行?或者手动修改此列的统计信息(为vol_id列设置n_distinct无效)。
我正在使用Postresql 9.5