如何(有效地)对昂贵的表表达式进行多次选择?
我有一个像这样的结构的表
CREATE TABLE facts (
subject_id INT,
visit_id INT,
study_id INT,
provider_id INT,
variable_id INT,
value TEXT
)
即。每行是具有多个维度和值的度量。 facts
表格很大,但每个维度都要小得多,例如SELECT DISTINCT subject_id FROM facts
的基数可能是几百。
现在我想找到事实子集的唯一维度值,即study_id = X的不同主题,访问,提供者和变量的ID。通过执行多个查询可以很容易地查询
SELECT DISTINCT subject_id FROM facts WHERE study_id = X;
SELECT DISTINCT visit_id FROM facts WHERE study_id = X;
SELECT DISTINCT provider_id FROM facts WHERE study_id = X;
SELECT DISTINCT variable_id FROM facts WHERE study_id = X;
但是每个查询都必须对facts
表(或索引)进行单独扫描。 (SELECT * FROM facts WHERE study_id = X
的基数也很大,但不如整个表大。)
是否有某种方法可以合并这些查询,以便数据库只需对facts
表进行一次扫描,并一次性收集所有不同的维度ID?
到目前为止,我尝试使用公用表表达式,但是仍然会在CTE上进行多次扫描(在Postgres中),因此它没有帮助。 e.g:
WITH selected AS (SELECT * FROM facts WHERE study_id = X)
SELECT DISTINCT subject_id, 1 FROM selected
UNION ALL SELECT DISTINCT visit_id, 2 FROM selected
UNION ALL SELECT DISTINCT provider_id, 3 FROM selected
UNION ALL SELECT DISTINCT variable_id, 4 FROM selected
有没有办法让数据库只对facts
进行一次扫描并收集所有需要的结果?我对Postgres和Oracle支持特别感兴趣。
答案 0 :(得分:2)
这可能是Oracle的一种方法。
<强>设置:强>
create table test(a, b, c) as (
select 1, 2, 3 from dual union all
select 1, 20, 30 from dual union all
select 1, 2, 30 from dual union all
select 1, 20, 30 from dual union all
select 10, 2, 3 from dual union all
select 10, 22, 3 from dual union all
select 1, 20, 3 from dual
)
<强>查询:强>
select distinct column_name, column_value
from test
unpivot (column_value for column_name in (a, b, c) )
<强>计划:强>
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 21 | 336 | 10 (10)| 00:00:01 |
| 1 | HASH UNIQUE | | 21 | 336 | 10 (10)| 00:00:01 |
|* 2 | VIEW | | 21 | 336 | 9 (0)| 00:00:01 |
| 3 | UNPIVOT | | | | | |
| 4 | TABLE ACCESS FULL| TEST | 7 | 273 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("unpivot_view_006"."COLUMN_VALUE" IS NOT NULL)
<强>结果:强>
C COLUMN_VALUE
- ------------
B 22
C 3
C 30
A 10
A 1
B 2
B 20
我用一张很小的桌子做了测试;这里的计划显示了对表的单个完整扫描,但后面是unpivot和散列唯一。
对于同一个表,UNION
的解决方案是:
select distinct a , 'a' from test union
select distinct b , 'b' from test union
select distinct c , 'c' from test
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 21 | 273 | 12 (75)| 00:00:01 |
| 1 | SORT UNIQUE | | 21 | 273 | 12 (75)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
| 3 | TABLE ACCESS FULL| TEST | 7 | 91 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST | 7 | 91 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL| TEST | 7 | 91 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------