我有一个修改过的TPC-H 20 Query,它根据谓词有意外行为。我将查询范围缩小到主要问题。除了具有带有id(1,2,3,...)的单个列的QIDTABLE之外,所有其他表都是默认的TPC-H表。以下是基本查询。此查询并行运行并使用它应该使用的所有指定的CPU。
select
qid, ps_suppkey
from
tpch.partsupp, tpch.part, tpch.qidtable
where
qid < 1
and (
(p_name like 'burlywood%' and qid = 0)
)
and ps_availqty > (
select 0.5 * sum(l_quantity)
from tpch.lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and (
((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0)
)
)
但是,如果我添加一些涉及qid的谓词,那么查询只使用一个核心,并且需要永远完成。这是一个例子:
select
qid, ps_suppkey
from
tpch.partsupp, tpch.part, tpch.qidtable
where
qid < 2
and (
(p_name like 'burlywood%' and qid = 0) or
(p_name like 'burlywood%' and qid = 1)
)
and ps_availqty > (
select 0.5 * sum(l_quantity)
from tpch.lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and (
((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0) or
((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 1)
)
)
我发现这是由计算总和的内部选择引起的:
((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0) or
((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 1)
这里有多个谓词会禁用并行执行。在这种情况下,由于两个谓词的日期相同,我可以将其重写为:
((l_shipdate between '1994-01-01' and '1995-01-01') and (qid = 0 or qid = 1))
在这种形式下,查询再次并行执行,但通常日期不同,我无法将它们组合起来。
为什么这两个版本之间究竟有什么区别?
修改 这是一个更复杂的查询,具有不同的谓词值,以便更好地理解:
select
qid, ps_suppkey
from
tpch.partsupp, tpch.part, tpch.qidtable
where
qid < 3
and (
(p_name like 'burlywood%' and qid = 0) or
(p_name like 'bisque%' and qid = 1) or
(p_name like 'almond%' and qid = 2)
)
and ps_availqty > (
select 0.5 * sum(l_quantity)
from tpch.lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and (
((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0) or
((l_shipdate between '1997-01-01' and '1998-01-01') and qid = 1) or
((l_shipdate between '1992-01-01' and '1993-01-01') and qid = 2)
)
)
答案 0 :(得分:0)
您可以大大简化第二个查询中的where谓词。这应该是一回事。
where
--qid < 2 this is redundant, it already deals with this in the next predicate
p_name like 'burlywood%'
and qid in (0, 1)
and ps_availqty >
(
select 0.5 * sum(l_quantity)
from tpch.lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate between '1994-01-01' and '1995-01-01'
and qid in (0, 1)
)