我有两张桌子a和b。表a包含大约600,000行和6个文本列,表b包含大约30,000行和6个文本列。我正在尝试这样做
create table c as
select *
from a, b
where a.file_name between b.starting_file_name and b.ending_file_name;
我将file_name索引在a上,并且在b上单独编制索引的starting_file_name和ending_file_name。令人惊讶的是,我的HP Proliant ML350p服务器(64GB内存)需要1个多小时左右
以下是Postgres的一些其他配置:
shared_buffers = 16GB
work_mem = 1GB
maintenance_work_mem = 1GB
effective_cache_size = 32GB
说明:
Nested Loop (cost=0.00..261971798.23 rows=2685032391 width=250)" "
Join Filter: (a.file_name >= b.starting_file_name)" "
-> Seq Scan on a (cost=0.00..21144.88 rows=618988 width=162)" "
-> Index Scan using b_ending_file_name_idx on b (cost=0.00..228.00 rows=13013 width=88)" "
Index Cond: (a.file_name<= b.end_file_name)"
也尝试了
create table c as
select *
from a, b
where a.file_name >=b.starting_file_name
and a.file_name<= b.ending_file_name;
以下是解释:
"Nested Loop (cost=0.00..261971798.23 rows=2685032391 width=250)"
" Join Filter: (a.file_name>= b.starting_file_name)"
" -> Seq Scan on a (cost=0.00..21144.88 rows=618988 width=162)"
" -> Index Scan using b_ending_file_name_idx on b (cost=0.00..228.00 rows=13013 width=88)"
" Index Cond: (a.file_name<= b.end_file_name)"
任何建议都将不胜感激。
答案 0 :(得分:0)
您可能会对(b.starting_file_name, b.ending_file_name)
上的综合索引感到满意。
此外,如果字符串在第一个相对较短的字符数中通常是唯一的,则可以在子字符串上创建表达式索引,然后对整个字符串进行重新检查,例如
CREATE INDEX b_filename_prefixes ON b (
left(starting_file_name, 20),
right(ending_file_name, 20)
);
然后
select *
from a, b
where
left(a.file_name, 20) between left(b.starting_file_name, 20) and left(b.ending_file_name, 20)
and a.file_name between b.starting_file_name and b.ending_file_name;
我已经在一些简单的样本数据上对此进行了测试,以确认规划人员会将该指数识别为候选人,并确实如此。