添加prediacte会禁用并行执行

时间:2017-06-23 17:53:38

标签: sql-server parallel-processing

我有一个修改过的TPC-H 20 Query,它根据谓词有意外行为。我将查询范围缩小到主要问题。除了具有带有id(1,2,3,...)的单个列的QIDTABLE之外,所有其他表都是默认的TPC-H表。以下是基本查询。此查询并行运行并使用它应该使用的所有指定的CPU。

  select 
    qid, ps_suppkey
  from
    tpch.partsupp, tpch.part, tpch.qidtable 
  where
      qid < 1
      and (
            (p_name like 'burlywood%' and qid = 0)
          )
      and ps_availqty > (
        select 0.5 * sum(l_quantity)
        from tpch.lineitem
        where
           l_partkey = ps_partkey
           and l_suppkey = ps_suppkey
           and (
                 ((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0)
               )
      )

但是,如果我添加一些涉及qid的谓词,那么查询只使用一个核心,并且需要永远完成。这是一个例子:

  select 
    qid, ps_suppkey
  from
    tpch.partsupp, tpch.part, tpch.qidtable 
  where
      qid < 2
      and (
            (p_name like 'burlywood%' and qid = 0) or
            (p_name like 'burlywood%' and qid = 1)
          )
      and ps_availqty > (
        select 0.5 * sum(l_quantity)
        from tpch.lineitem
        where
           l_partkey = ps_partkey
           and l_suppkey = ps_suppkey
           and (
             ((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0) or
             ((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 1)
               )
      )

我发现这是由计算总和的内部选择引起的:

((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0) or
((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 1)

这里有多个谓词会禁用并行执行。在这种情况下,由于两个谓词的日期相同,我可以将其重写为:

((l_shipdate between '1994-01-01' and '1995-01-01') and (qid = 0 or qid = 1))

在这种形式下,查询再次并行执行,但通常日期不同,我无法将它们组合起来。

为什么这两个版本之间究竟有什么区别?

修改 这是一个更复杂的查询,具有不同的谓词值,以便更好地理解:

select 
  qid, ps_suppkey
from
  tpch.partsupp, tpch.part, tpch.qidtable 
where
  qid < 3
  and (
        (p_name like 'burlywood%' and qid = 0) or
        (p_name like 'bisque%' and qid = 1) or
        (p_name like 'almond%' and qid = 2)
      )
  and ps_availqty > (
    select 0.5 * sum(l_quantity)
    from tpch.lineitem
    where
       l_partkey = ps_partkey
       and l_suppkey = ps_suppkey
       and (
            ((l_shipdate between '1994-01-01' and '1995-01-01') and qid = 0) or
            ((l_shipdate between '1997-01-01' and '1998-01-01') and qid = 1) or
            ((l_shipdate between '1992-01-01' and '1993-01-01') and qid = 2)
           )
  )

1 个答案:

答案 0 :(得分:0)

您可以大大简化第二个查询中的where谓词。这应该是一回事。

where
    --qid < 2 this is redundant, it already deals with this in the next predicate
    p_name like 'burlywood%'
    and qid in (0, 1)
    and ps_availqty > 
    (
        select 0.5 * sum(l_quantity)
        from tpch.lineitem
        where
            l_partkey = ps_partkey
            and l_suppkey = ps_suppkey
            and l_shipdate between '1994-01-01' and '1995-01-01'
            and qid in (0, 1)
    )