Question

我有以下2个表和数据分布：

drop table if exists line;
drop table if exists header;

create table header (header_id serial primary key, type character);
create table line (line_id serial primary key, header_id serial not null, type character, constraint line_header foreign key (header_id) references header (header_id)) ;
create index inv_type_idx on header (type);
create index line_type_idx on line (type);

insert into header (type) select case when floor(random()*2+1) = 1 then  'A' else 'B' end from generate_series(1,100000);
insert into line (header_id, type) select header_id,  case when floor(random()*10000+1) = 1 then (case when type ='A' then 'B' else 'A' end) else type end from header, generate_series(1,5);

header表具有10万行：type A的50％和B的50％
line表有50万行：
- 每个header都有5个line
- 总体上，type A的行占50％，B的行占50％
- line与header相同，只有0.01％的差异

数据分发：

# select h.type header_type, l.type line_type, count(*) from line l inner join header h on l.header_id = h.header_id group by 1,2 order by 1,2;
 header_type | line_type | count  
-------------+-----------+--------
 A           | A         | 250865
 A           | B         |     25
 B           | A         |     29
 B           | B         | 249081
(4 rows)

我需要获得所有line和type为B的header为A的explain select * from line l inner join header h on l.header_id = h.header_id where h.type ='A' and l.type='B'; QUERY PLAN --------------------------------------------------------------------------- Hash Join (cost=2323.29..14632.89 rows=125545 width=19) Hash Cond: (l.header_id = h.header_id) -> Seq Scan on line l (cost=0.00..11656.00 rows=248983 width=13) Filter: (type = 'B'::bpchar) -> Hash (cost=1693.00..1693.00 rows=50423 width=6) -> Seq Scan on header h (cost=0.00..1693.00 rows=50423 width=6) Filter: (type = 'A'::bpchar) (7 rows)。即使是总数非常有限（500000行中的25行），我获得的计划（以下是PostgreSQL 10），它在两个表中都执行顺序扫描：

line

有什么方法可以优化这种类型的查询，其中数据歧视很高，但仅当组合来自多个表的信息时才如此？

当然，作为解决方法，我可以对存储在header中alter table line add column compound_type char(2); create index compound_idx on line (compound_type); update line l set compound_type = h.type || l.type from header h where h.header_id = l.header_id; # explain select * from line where compound_type = 'BA'; QUERY PLAN ----------------------------------------------------------------------------- Index Scan using compound_idx on line (cost=0.42..155.58 rows=50 width=13) Index Cond: (compound_type = 'BA'::bpchar) (2 rows)的信息中的信息进行非规范化，这将使此查询的性能更高。但是，如果可能的话，我宁愿不必这样做，因为我需要维护这些重复的信息。

<Form.Input label='Enter Password' type='password' />
vs

<Form.Field>
  <label style={{fontSize: '10px'}}>Enter Password</label>
  <Input type='password' style={{fontSize: '10px'}} />
</Form.Field>

Answer 1

1）您可以使用具有正确索引的物化视图。它可以在“后台”更新。否则，它类似于行中的组合索引。

2）如果在（line.header_id，line.type）上创建索引并按以下方式强制子查询，则可以将搜索反向到标题行：

select header_id 
from header h 
where type='A' and 
    exists(select * from line l where l.header_id=h.header_id and l.type='B')

在获得所有标题后，再次选择具有相应header_id的行。

将type包含在一些标头索引中可能是个好主意，这样2个索引就成为了查找所需的全部内容。

仍然它将在标头索引中读取约5万行，并在第二索引中查找每一行。通常，它并不有效，但是如果索引完全适合内存，那可能还不错。

如何优化只能基于两个表区分数据的2表查询？

1 个答案: