我有两个带有布尔列(标志)的矩阵:
process_state
- 跟踪正在处理的单元的属性集configuration
- 使用属性组合简单示例:
process_state
a b c
t t t
t f t
f f t
configuration
a b c, process_state_class
t t t classa
f f f classb
f f NULL classc
现在我需要将连接配置保留为进程状态,以便确定哪个配置(进程状态类)属于进程状态(如果有)。配置中的NULL
表示此列可以包含进程状态中的任何值(true, false
)。进程状态表中永远不会有NULLL
个值。
我有:
select
*
from
process_state ps
left join configuration c on (
((ps.a = c.a) or c.a is null)
and ((ps.b = c.b) or c.b is null)
and ((ps.c = c.c) or c.c is null)
)
我有很多配置和很多列(20),并且通常一个进程状态属于多个配置。这使整个查询变慢。
是否有不同的SQL方式如何编写此类查询?因为我是这些表的所有者,从一开始就有不同的方法来解决这个问题吗?我以为我在两个表中只有0/1字符串,如:
process_state
abc
111
101
001
然后使用简单的左连接。不幸的是,这不适用于具有一些NULL
的工作配置。
这是我获得的样本数据的解释计划,其中包含处理状态中的100000行和配置中的300行以及15个布尔属性:
Nested Loop (cost=0.00..8863403.33 rows=5829 width=30) (actual time=0.145..8673.960 rows=14318 loops=1)
Join Filter: (((ps.a = c.a) OR (c.a IS NULL)) AND ((ps.b = c.b) OR (c.b IS NULL)) AND ((ps.c = c.c) OR (c.c IS NULL)) AND ((ps.d = c.d) OR (c.d IS NULL)) AND ((ps.e = c.e) OR (c.e IS NULL)) AND ((ps.f = c.f) OR (c.f IS NULL)) AND ((ps.g = c.g) OR (c.g IS NULL)) AND ((ps.h = c.h) OR (c.h IS NULL)) AND ((ps.i = c.i) OR (c.i IS NULL)) AND ((ps.j = c.j) OR (c.j IS NULL)) AND ((ps.k = c.k) OR (c.k IS NULL)) AND ((ps.l = c.l) OR (c.l IS NULL)) AND ((ps.m = c.m) OR (c.m IS NULL)) AND ((ps.n = c.n) OR (c.n IS NULL)) AND ((ps.o = c.o) OR (c.o IS NULL)))
-> Seq Scan on process_state ps (cost=0.00..1520.21 rows=97921 width=15) (actual time=0.007..11.835 rows=100000 loops=1)
-> Materialize (cost=0.00..37.15 rows=1810 width=15) (actual time=0.000..0.017 rows=300 loops=100000)
-> Seq Scan on configuration c (cost=0.00..28.10 rows=1810 width=15) (actual time=0.004..0.036 rows=300 loops=1)
Total runtime: 8674.763 ms
我尝试将两个表折叠为单个列,并在NULL
的配置中替换.
。现在我可以使用单一条件regexp join:
select
*
from
process_state ps
left join configuration c on (ps.process_state ~~ c.configuration)
与上述解释计划相比,仍然没有显着改善:
Nested Loop (cost=0.00..451546.75 rows=150000 width=32) (actual time=154.127..6971.697 rows=22 loops=1)
Join Filter: (ps.process_state ~~ c.configuration)
-> Seq Scan on process_state2 ps (cost=0.00..1541.00 rows=100000 width=16) (actual time=0.006..12.700 rows=100000 loops=1)
-> Materialize (cost=0.00..6.50 rows=300 width=16) (actual time=0.000..0.020 rows=300 loops=100000)
-> Seq Scan on configuration2 c (cost=0.00..5.00 rows=300 width=16) (actual time=0.004..0.036 rows=300 loops=1)
Total runtime: 6971.721 ms
我还尝试用NULL
替换_
并使用like
运算符 - 没有改进。
答案 0 :(得分:0)
你可以试试这个,但我不确定它会更快:
select
*
from
process_state ps
left join configuration c
on ps.a = coalesce(c.a, ps.a)
and ps.b = coalesce(c.b, ps.b)
and ps.c = coalesce(c.c, ps.c)