Question

我有一个包含EntityID列的大约200,000条记录的列表，我将其加载到临时表变量中。

如果dbo.EntityRows表中不存在Temp表中的EntityID，我想插入Temp表变量中的任何记录。 dbo.EntityRows表包含大约800,000条记录。

与dbo.EntityRows表有大约500,000条记录的情况相比，该过程非常缓慢。

我的第一个猜测是因为NOT EXISTS子句，Temp变量中的每一行都必须扫描dbo.EntityRows表的整个800k行以确定它是否存在。

问题：是否有其他方法可以在不使用NOT EXISTS的情况下运行此比较检查，这会产生巨大的成本，并且只会随着dbo.EntityRows继续增长而变得更糟？

编辑：感谢评论。这是查询（我在IF NOT EXISTS检查之后省略了部分。之后，如果不是EXISTS，我插入4个表中。）

declare @EntityCount int, @Counter int, @ExistsCounter int, @AddedCounter int
declare @LogID int
declare @YdataInsertedEntityID int, @YdataSearchParametersID int
declare @CurrentEntityID int
declare @CurrentName nvarchar(80)
declare @CurrentSearchParametersID int, @CurrentSearchParametersIDAlreadyDone int 
declare @Entities table 
(
    Id int identity,
    EntityID int,
    NameID nvarchar(80), 
    SearchParametersID int
)

insert into @Entities
select EntityID, NameID, SearchParametersID from YdataArvixe.dbo.Entity     order by entityid;


set @EntityCount = (select count(*) from @Entities);
set @Counter = 1;
set @LogID = null;
set @ExistsCounter = 0;
set @AddedCounter = 0;
set @CurrentSearchParametersIDAlreadyDone = -1;

While (@EntityCount >= @Counter)
begin
    set @CurrentEntityID = (select EntityID from @Entities
                                where id = @Counter)

    set @CurrentName = (select nameid from @Entities
                                    where id = @Counter);

    set @CurrentSearchParametersID = (select SearchParametersID from @Entities
                                            where id = @Counter)

    if not exists (select 1 from ydata.dbo.entity
                    where NameID = @CurrentName)
    begin
       -- I insert into 4 tables IF NOT EXISTS = true
    end

Answer 1

我不确定，但有以下方法可以检查

(SELECT COUNT(er.EntityID) FROM dbo.EntityRows er WHERE er.EntityID = EntityID) <> 0

(SELECT er.EntityID FROM dbo.EntityRows er WHERE er.EntityID = EntityID) IS NOT NULL

EntityID NOT EXISTS  (SELECT er.EntityID FROM dbo.EntityRows er)

EntityID NOT IN (SELECT er.EntityID FROM dbo.EntityRows er)

但是根据我的观点，获得数数会有很好的表现。此外，索引将有助于提高绩效，因为Felix Pamittan＆＃39;说

Answer 2

正如@gotqn所说，首先使用临时表。填充表后，在EntityID上创建索引。如果您在EntityRows中没有EntityID的索引，请创建一个。

我做了很多这样的事情，我通常使用以下模式：

INSERT INTO EntityRows (
    EntityId, ...
)

SELECT T.EntityId, ...
FROM #tempTable T
LEFT JOIN EntityRows E
ON T.EntityID = E.EntityID
WHERE E.EntityID IS NULL

如果您想了解更多信息，请发表评论。

Answer 3

嗯，答案很基本。 @Felix和@TT有正确的建议。谢谢！

我在ydata.dbo.entity的NameID字段中放了一个非聚集索引。

public Token analyzeToken(String regex, String string) {
            Pattern p = Pattern.compile(regex);
            Matcher m = p.matcher(string); // match the string, not the token!
            if(m.matches()) {
               // ...
            }

因此它现在可以使用索引快速处理NOT EXISTS部分，而不是扫描整个dbo.entity表。它再次快速移动。

SQL Server - 使用NOT EXISTS的替代方法

3 个答案: