有关优化的更多信息

Question

我在下面有一张表。

enter image description here

在以下查询中，外部查询加入like列与子查询的tag比较。

 SELECT top 6 *
  FROM [piarchive].[picomp2]
  WHERE tag Like
  (
  Select distinct left(tag,19) + '%' 
  from (SELECT  *
  FROM [piarchive].[picomp2]
  WHERE tag like '%CPU_Active' and  time between '2014/10/02 15:13:08'and'2014/10/02 15:18:37'
  and value=-524289 order by time desc) as t1
  )  
  and tag not like '%CPU_Active' and tag not like '%Program%' and time between '2014/10/02    
  15:13:08'and'2014/10/02 15:18:37'  order by time desc

但是此子查询返回多行，导致以下错误：

错误：＆＃34;当用作表达式时，子查询最多可以返回一行。＆＃34;

Answer 1

替换where tag like (...)（其中...是子查询，此处为了简洁而省略）部分where exists (...)，并将like比较带入子查询。

select top 6
    *
from
    [piarchive].[picomp2] t0
where
    exists
    (
        select
            *
        from
            (
                select
                    *
                from
                    [piarchive].[picomp2]
                where
                    tag like '%cpu_active' and time between '2014/10/02 15:13:08' and '2014/10/02 15:18:37'
                    and
                    value = -524289
            )
            as t1
        where
            t0.tag like left(t1.tag, 19) + '%' 
    ) 
    and
    tag not like '%cpu_active'
    and
    tag not like '%program%'
    and
    time between '2014/10/02 15:13:08' and '2014/10/02 15:18:37'
order by
    time desc;

我已在外部查询中添加了一个表别名来消除tag列的歧义，但您可以看到like比较已转移到子查询中。

我无法保证这对大型数据集的性能如何，但这是一个不同的主题。就个人而言，我会寻找一种方法来完全摆脱子查询，因为它都在查询同一个表。

有关优化的更多信息

它不容易优化，索引在这里用处不大，原因如下：

连接条件（t0.tag like left(t1.tag, 19) + '%'）并不简单，查询优化器可能很难产生比嵌套循环更好的东西（即 为外部的每一行执行子查询查询 ）。这可能是你在这里最大的表现杀手。

like比较中没有一个可以利用表索引，因为它们正在检查值的 end ，而不是 start 。

您唯一的希望可能是日期范围检查具有高度选择性（消除了大量记录）。由于在外部和内部查询中都对time字段执行了相同的检查，因此您可以将其选择到临时表中：

select left(tag, 19) as key, * into #working from [piarchive].[picomp2] where [time] between '2014/10/02 15:13:08' and '2014/10/02 15:18:37';

#working现在只有指定时间段内的记录。由于您的示例范围非常狭窄（仅为5 1/2分钟），我谨慎下去可能会导致约99％的记录被淘汰。 time上的索引会显着加快这一速度。执行此操作后，您只需处理一小部分数据。

然后，可能（见后文）索引key：

create clustered index cx_key on #working (key);

然后完成查询的其余部分：

select a.* from #working a where exists ( select * from #working b where a.key = b.key and b.tag like '%cpu_active' ) and a.tag not like '%program%' and a.tag not like '%cpu_active'

我所做的是在加入条件（tag的前19个字符）上创建聚簇索引以优化子查询。你必须对此进行测试，因为如果首先创建索引的成本超过了收益，它可能没有任何区别，甚至会减慢速度。这取决于您拥有的数据量以及其他因素。通过这样做我只获得了最小的收益（大约增加了5％），尽管我只是针对几百行测试数据运行了这一点。您拥有的数据越多，它应该越有效。

＆＃34;子查询有多行用于比较

1 个答案:

有关优化的更多信息