在PostgreSQL中左连接大布尔矩阵

时间:2014-03-27 13:48:12

标签: sql postgresql

我有两个带有布尔列(标志)的矩阵:

  • process_state - 跟踪正在处理的单元的属性集
  • configuration - 使用属性组合
  • 定义流程状态类

SQLFiddle

简单示例:

process_state
a   b   c
t   t   t
t   f   t
f   f   t

configuration
a   b   c,   process_state_class
t   t   t    classa
f   f   f    classb
f   f   NULL classc

现在我需要将连接配置保留为进程状态,以便确定哪个配置(进程状态类)属于进程状态(如果有)。配置中的NULL表示此列可以包含进程状态中的任何值(true, false)。进程状态表中永远不会有NULLL个值。

我有:

   select
     *
   from
     process_state ps
     left join configuration c on (
            ((ps.a = c.a) or c.a is null)
        and ((ps.b = c.b) or c.b is null)
        and ((ps.c = c.c) or c.c is null)
     )

我有很多配置和很多列(20),并且通常一个进程状态属于多个配置。这使整个查询变慢。

是否有不同的SQL方式如何编写此类查询?因为我是这些表的所有者,从一开始就有不同的方法来解决这个问题吗?我以为我在两个表中只有0/1字符串,如:

 process_state
 abc
 111
 101
 001

然后使用简单的左连接。不幸的是,这不适用于具有一些NULL的工作配置。

关于SQLFiddle

的更实际的数据

这是我获得的样本数据的解释计划,其中包含处理状态中的100000行和配置中的300行以及15个布尔属性:

Nested Loop  (cost=0.00..8863403.33 rows=5829 width=30) (actual time=0.145..8673.960 rows=14318 loops=1)
  Join Filter: (((ps.a = c.a) OR (c.a IS NULL)) AND ((ps.b = c.b) OR (c.b IS NULL)) AND ((ps.c = c.c) OR (c.c IS NULL)) AND ((ps.d = c.d) OR (c.d IS NULL)) AND ((ps.e = c.e) OR (c.e IS NULL)) AND ((ps.f = c.f) OR (c.f IS NULL)) AND ((ps.g = c.g) OR (c.g IS NULL)) AND ((ps.h = c.h) OR (c.h IS NULL)) AND ((ps.i = c.i) OR (c.i IS NULL)) AND ((ps.j = c.j) OR (c.j IS NULL)) AND ((ps.k = c.k) OR (c.k IS NULL)) AND ((ps.l = c.l) OR (c.l IS NULL)) AND ((ps.m = c.m) OR (c.m IS NULL)) AND ((ps.n = c.n) OR (c.n IS NULL)) AND ((ps.o = c.o) OR (c.o IS NULL)))
  ->  Seq Scan on process_state ps  (cost=0.00..1520.21 rows=97921 width=15) (actual time=0.007..11.835 rows=100000 loops=1)
  ->  Materialize  (cost=0.00..37.15 rows=1810 width=15) (actual time=0.000..0.017 rows=300 loops=100000)
        ->  Seq Scan on configuration c  (cost=0.00..28.10 rows=1810 width=15) (actual time=0.004..0.036 rows=300 loops=1)
Total runtime: 8674.763 ms

我尝试将两个表折叠为单个列,并在NULL的配置中替换.。现在我可以使用单一条件regexp join:

select
  *
from
  process_state ps
  left join configuration c on (ps.process_state ~~ c.configuration)

与上述解释计划相比,仍然没有显着改善:

Nested Loop  (cost=0.00..451546.75 rows=150000 width=32) (actual time=154.127..6971.697 rows=22 loops=1)
  Join Filter: (ps.process_state ~~ c.configuration)
  ->  Seq Scan on process_state2 ps  (cost=0.00..1541.00 rows=100000 width=16) (actual time=0.006..12.700 rows=100000 loops=1)
  ->  Materialize  (cost=0.00..6.50 rows=300 width=16) (actual time=0.000..0.020 rows=300 loops=100000)
        ->  Seq Scan on configuration2 c  (cost=0.00..5.00 rows=300 width=16) (actual time=0.004..0.036 rows=300 loops=1)
Total runtime: 6971.721 ms

我还尝试用NULL替换_并使用like运算符 - 没有改进。

1 个答案:

答案 0 :(得分:0)

你可以试试这个,但我不确定它会更快:

select
 *
from
 process_state ps
 left join configuration c 
         on ps.a = coalesce(c.a, ps.a)
        and ps.b = coalesce(c.b, ps.b)
        and ps.c = coalesce(c.c, ps.c)