我有一个包含450,000行的表,其中varchar
列的长度可变(6到13个字符之间,分布不均匀)。我需要使用标准来加入另一个表,该标准表示目标表中的列以第一个表的列的值开头。
在我目前的测试样本中,我知道所有匹配都是6个字符,所以我正在使用t1.Digits = left(t2.Number, 6)
进行连接,这非常快(运行我的大查询几秒钟)。我的测试样本是10,000条记录,但在生产中,查询需要运行数十万条。
我也知道绝大多数记录总是6个字符的匹配,但我需要支持更多的匹配,否则我有时会得到重复的记录。问题是我已经尝试了以下所有这些方法,并且每个方法都比左边六个字符的简单连接慢得多。我从来没有让他们跑超过五分钟,但他们没有表现出任何终止的迹象:
t1.Digits = left(t2.Number, datalength(t1.Digits))
charindex(t1.Digits, t2.Number) = 1
DigitLength int
列添加到t1
,然后使用t1.Digits = left(t2.Number, t1.DigitLength)
t2.Number like t1.Digits + '%'
以上四种解决方案中的每一种都在理论上实现了我想要的,但对于我的目的来说运行速度太慢。
即使这些列中的值是数字,我也使用varchar
,因为在许多情况下需要保留前导零。在任何情况下,即使数据是字母数字的情况,也应该有一个快速的解决方案。
是否有人意识到一种非常快速的“开始”逻辑,其性能可与我过于简单的连接相媲美?
我在t1.Digits
列上有聚集索引吗?
以下是使用上述方法#4运行的执行计划:
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.0" Build="9.00.5000.00" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="10720" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="7471.7" StatementText="select c.FromNumber, c.ToNumber, d.Destination, d.Digits
from Converting c
--join CASH.CASH.dbo.DestinationLookup d on d.Digits = left(c.FromNumber, 6) 
join CASH.CASH.dbo.DestinationLookup d on c.FromNumber like d.Digits + '%' 
" StatementType="SELECT">
<StatementSetOptions ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="false" />
<QueryPlan DegreeOfParallelism="1" MemoryGrant="114" CachedPlanSize="99" CompileTime="36" CompileCPU="35" CompileMemory="312">
<RelOp AvgRowSize="77" EstimateCPU="174.861" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Inner Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="7471.7">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</OuterReferences>
<RelOp AvgRowSize="38" EstimateCPU="0.164714" EstimateIO="0.00281532" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Sort" NodeId="1" Parallel="false" PhysicalOp="Sort" EstimatedTotalSubtreeCost="0.340338">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</OutputList>
<MemoryFractions Input="1" Output="1" />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1" ActualRewinds="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<Sort Distinct="false">
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</OrderByColumn>
</OrderBy>
<RelOp AvgRowSize="38" EstimateCPU="0.00296763" EstimateIO="0.126907" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.129875">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</DefinedValue>
</DefinedValues>
<Object Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" />
</TableScan>
</RelOp>
</Sort>
</RelOp>
<RelOp AvgRowSize="48" EstimateCPU="0.00290986" EstimateIO="0.01" EstimateRebinds="1390" EstimateRewinds="9329" EstimateRows="15609.2" LogicalOp="Lazy Spool" NodeId="3" Parallel="false" PhysicalOp="Table Spool" EstimatedTotalSubtreeCost="7296.5">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1391" ActualRewinds="9329" ActualRows="10720" ActualEndOfScans="10720" ActualExecutions="10720" />
</RunTimeInformation>
<Spool>
<RelOp AvgRowSize="48" EstimateCPU="5.21308" EstimateIO="0" EstimateRebinds="1390" EstimateRewinds="0" EstimateRows="15609.2" LogicalOp="Compute Scalar" NodeId="4" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="7251.4">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Digits] as [d].[Digits]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
</Identifier>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Destination] as [d].[Destination]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</Identifier>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="48" EstimateCPU="5.21308" EstimateIO="0" EstimateRebinds="1390" EstimateRewinds="0" EstimateRows="15609.2" LogicalOp="Remote Query" NodeId="5" Parallel="false" PhysicalOp="Remote Query" EstimatedTotalSubtreeCost="7251.4">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="1391" ActualRewinds="0" ActualRows="1391" ActualEndOfScans="1391" ActualExecutions="1391" />
</RunTimeInformation>
<RemoteQuery RemoteSource="CASH" RemoteQuery="SELECT "Tbl1004"."Digits" "Col1021","Tbl1004"."Destination" "Col1022" FROM "CASH"."dbo"."DestinationLookup" "Tbl1004" WHERE ? like "Tbl1004"."Digits"+'%'" />
</RelOp>
</ComputeScalar>
</RelOp>
</Spool>
</RelOp>
</NestedLoops>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
这是加入简单左边的计划(t2.Number,6):
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.0" Build="9.00.5000.00" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="10720" StatementId="1" StatementOptmLevel="FULL" StatementSubTreeCost="15.1845" StatementText="select c.FromNumber, c.ToNumber, d.Destination, d.Digits
from Converting c
join CASH.CASH.dbo.DestinationLookup d on d.Digits = left(c.FromNumber, 6) " StatementType="SELECT">
<StatementSetOptions ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="false" />
<QueryPlan DegreeOfParallelism="1" CachedPlanSize="105" CompileTime="60" CompileCPU="58" CompileMemory="360">
<RelOp AvgRowSize="77" EstimateCPU="0.0448096" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Inner Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="15.1845">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Column="Expr1005" />
</OuterReferences>
<RelOp AvgRowSize="43" EstimateCPU="0.001072" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Compute Scalar" NodeId="1" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.13985">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
<ColumnReference Column="Expr1005" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="substring([CASH].[dbo].[Converting].[FromNumber] as [c].[FromNumber],(1),(6))">
<Intrinsic FunctionName="substring">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(1)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(6)" />
</ScalarOperator>
</Intrinsic>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="38" EstimateCPU="0.011949" EstimateIO="0.126829" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="10720" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.138778">
<OutputList>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10720" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="FromNumber" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" Column="ToNumber" />
</DefinedValue>
</DefinedValues>
<Object Database="[CASH]" Schema="[dbo]" Table="[Converting]" Alias="[c]" />
</TableScan>
</RelOp>
</ComputeScalar>
</RelOp>
<RelOp AvgRowSize="48" EstimateCPU="0.000258212" EstimateIO="0.003125" EstimateRebinds="10580.9" EstimateRewinds="138.124" EstimateRows="1" LogicalOp="Lazy Spool" NodeId="6" Parallel="false" PhysicalOp="Index Spool" EstimatedTotalSubtreeCost="14.9998">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="830" ActualRewinds="9890" ActualRows="10720" ActualEndOfScans="0" ActualExecutions="10720" />
</RunTimeInformation>
<Spool>
<SeekPredicate>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Column="Expr1005" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="[Expr1005]">
<Identifier>
<ColumnReference Column="Expr1005" />
</Identifier>
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekPredicate>
<RelOp AvgRowSize="48" EstimateCPU="0.0103333" EstimateIO="0" EstimateRebinds="1180" EstimateRewinds="0" EstimateRows="1" LogicalOp="Compute Scalar" NodeId="7" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="12.2037">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Digits] as [d].[Digits]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
</Identifier>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
<ScalarOperator ScalarString="[CASH].[CASH].[dbo].[DestinationLookup].[Destination] as [d].[Destination]">
<Identifier>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</Identifier>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="48" EstimateCPU="0.0103333" EstimateIO="0" EstimateRebinds="1180" EstimateRewinds="0" EstimateRows="1" LogicalOp="Remote Query" NodeId="8" Parallel="false" PhysicalOp="Remote Query" EstimatedTotalSubtreeCost="12.2037">
<OutputList>
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Digits" />
<ColumnReference Server="[CASH]" Database="[CASH]" Schema="[dbo]" Table="[DestinationLookup]" Alias="[d]" Column="Destination" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRebinds="456" ActualRewinds="0" ActualRows="456" ActualEndOfScans="0" ActualExecutions="456" />
</RunTimeInformation>
<RemoteQuery RemoteSource="CASH" RemoteQuery="SELECT "Tbl1004"."Digits" "Col1015","Tbl1004"."Destination" "Col1016" FROM "CASH"."dbo"."DestinationLookup" "Tbl1004" WHERE "Tbl1004"."Digits"=?" />
</RelOp>
</ComputeScalar>
</RelOp>
</Spool>
</RelOp>
</NestedLoops>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
更新:我一直无法找到理想的解决方案,但我发现了下一个最好的解决方案。似乎使用“like”对这两个表进行简单的简单查询在大约五秒钟内完成。因此,我没有尝试将连接填充到我的怪物查询中,它从未完成,我用它来创建一个临时查找表,然后我的怪物查询使用它。总之,大查询现在在9秒内完成,我在varchar join中支持可变长度字符串。
另一个有助于提高速度的方法是将t1中列的填充因子从80更改为100.此填充因子非常适合表格,因为它是一个静态参考表,每年只更改一次。
答案 0 :(得分:2)
这四个中性能最佳的解决方案是第四个。
让我们设置测试环境:
create table #t1 (digits varchar(10), filler char(5000) default(''))
create table #t2 (number varchar(10), filler char(5000) default(''))
go
insert #t1 (digits) values
('123'),('234'),('345'),('456'),('567')
insert #t2 (number) values
('1234'),('234'),('345689'),('45'),('567890')
go
create index ix_t2 on #t2(number);
go
现在,让我们执行四个语义相同的查询,但使用Query - &gt;包括启用的实际执行计划,以及SET STATISTICS IO ON
:
-- 1
select *
from #t1
inner join #t2
on #t1.digits = left(#t2.number, datalength(#t1.digits))
-- 2
select *
from #t1
inner join #t2
on charindex(#t1.Digits, #t2.Number) = 1
-- 3
select *
from #t1
inner join #t2
on charindex(#t1.digits, #t2.number) = 1
-- 4
select *
from #t1
inner join #t2
on #t2.number like #t1.digits + '%'
如您所见,执行计划为1,2和&amp; 3包括两个表上的表扫描运算符(包括第一个的附加计算标量运算符),但第四个查询对#t2上的索引执行索引查找。此外,如果检查统计信息io的输出,您将看到对于1,2和&amp;的#t2(带索引的表)的逻辑读取度量。 3为25,但第四个只有14(当然,行数越多,数字就越高)。
答案 1 :(得分:1)
在table1.digits
上建立索引。然后尝试以下方法:
select t2.*, t1.<whatever>
from table1 t2 cross apply
(select top 1 <whatever>
from table1 t1
where t1.digits <= t2.number
order by t1.digits desc
) t1;
SQL Server有时比常规连接更优化“应用”查询。在这种情况下,它可能会发现索引对where
和order by
都有用,并且可以有效地进行。 (我也认为同样适用于相关子查询。)