如何索引Col1 +'|' + Col2进行最快的搜索

时间:2018-09-25 14:45:30

标签: sql sql-server

我在搜索时遇到问题,查询非常慢

Select * 
  From table1
Where 
  Col1 +'|' + Col2 Not IN 
  (Select Col1 +'|' + Col2 from table1 Where tabl1.condition2 = true)
Where tabl1.condition1 = true

内部和外部查询中的table1相同。但是因为我们在两列上都没有带有'|'的索引所以太慢了我该如何在Col1 +'|'Col2上创建索引,或者是否有其他更快的解决方案。

确定添加几乎原始查询

问题SQL

Select * 
FROM tblRawPos 
WHERE  
  Source = 'SRC' 
  AND Cust = 'CST' 
  AND NOT CustAcct+'|'+tblRawPos.Cusip IN (SELECT CustAcct+'|'+Cusip FROM tblRawPos WHERE Source = 'CST' AND Custodian = 'CST'  )
  AND NOT Account+'|'+tblRawPos.Cusip IN (SELECT Account+'|'+Cusip FROM tblRawPos WHERE Source = 'CST' AND Custodian = 'CST' )

根据建议的解决方案进行了更改(仍然很慢)

Select R.* 
FROM tblRawPos R 
WHERE  
  R.Source = 'SRC' 
  AND R.Cust = 'CST' 
  AND Not Exists (SELECT 1 FROM tblRawPos RR WHERE RR.Source = 'SRC' 
                        AND RR.Cust = 'CST' 
                        AND (
                            ( RR.CustAcct + '|' + RR.Cusip = R.CustAcct + '|' + R.Cusip) OR (RR.Account + '|' + RR.Cusip = R.Account + '|' + R.Cusip)
                            )
                        )

3 个答案:

答案 0 :(得分:7)

我假设表达式co1 + '|' + col2只是将两列合并为一列,而不是将“ A | B” /“ C”等同于“ A” /“ B | C”。 / p>

请勿将NOT IN与子查询一起使用。当任何返回值是NULL时,它不会执行您期望的操作。而是使用NOT EXISTS

Select t1.* 
From table1 t1
Where not exists (select 1
                  from table1 tt1
                  where tt1.col1 = t1.col2 and tt1.col2 = t1.col2 and <some condition>
                 );

为了提高性能,您希望在(col1, col2)上建立索引。您可以将用于附加条件的其他列添加为附加列。

另一个可能更快的选择是窗口函数:

select t1.*
from (select t1.*,
             sum(case when <conditions> then 1 else 0 end) over (partition by col1, col2) as cnt
      from table1 t1
     ) t1
where cnt = 0;

我强烈怀疑您是否需要在co1 + '|' + col2上建立索引,但是可以使用计算列创建一个索引:

alter table table1 add col_1_2 as (co1 + '|' + col2) persisted;

create index idx_table1_col_1_2 on table1(col_1_2);

然后您需要在代码中使用该列:

Select t1.* 
From table1 t1
Where not exists (select 1 from table1 tt1 where tt1.col_1_2 = t1.col_1_2 and . . .); 

不过,我强烈建议您使用前两种方法。

答案 1 :(得分:1)

已更新

尝试使用连接:

Select a.* 
FROM (
select * 
from tblRawPos
where Source = 'SRC' AND Cust = 'CST'
) a
left join 
(
select CustAcct, Account, Cusip  
from tblRawPos
where Source = 'CST' AND Custodian = 'CST'
) b on (a.CustAcct = b.CustAcct and a.Cusip = b.Cusip) or (a.Account = b.Account and a.Cusip = b.Cusip)
where b.source is null;

Join的性能优于in语句,并且where条件减小了“ b”表的大小,这也将提高效率。

答案 2 :(得分:1)

当两个字段的组合必须唯一时,通常会出现此类问题。例如。型号和序列号:

model_no | serial_no | combined
12-3     | 4-567     | 12-3-4-567
12-34    | 567       | 12-3-4-567

但是,您使用竖线字符会使您不太可能处理这种情况。似乎您希望与

查找相同的帐户和IP
WHERE (CustAcct, Cusip) NOT IN (SELECT CustAcct, Cusip FROM tblRawPos WHERE ...)

SQL Server不允许的。 (其他一些DBMS也会这样做。)

因此,请改用EXISTS

Select * 
FROM tblRawPos rp
WHERE Source = 'SRC' 
AND Cust = 'CST' 
AND NOT EXISTS
(
  SELECT *
  FROM tblRawPos other
  WHERE other.Cusip = rp.Cusip
  AND (other.CustAcct = rp.CustAcct OR other.Account = rp.Account)
  AND other.Source = 'CST'
  AND other.Custodian = 'CST' 
);

为此您至少应具有以下索引:

create index idx on tblRawPos (Cusip, Source, Custodian);

更好的是覆盖指数:

create index idx1 on tblRawPos (Cusip, Source, Custodian, Account, CustAcct);

您还应该尝试以SourceCustodian开头的其他索引:

create index idx2 on tblRawPos (Source, Custodian, Cusip, Account, CustAcct);
create index idx3 on tblRawPos (Custodian, Source, Cusip, Account, CustAcct);

您可以尝试其他变体。订单已更改。更少的列。最后,检查执行计划,其中哪些索引被DBMS使用并删除其他索引。