Question

我正在处理大量数据：600万行。我需要查询尽可能快地运行，但我不知道进一步优化。我已经删除了3个子查询，并在10万行的适度数据集上将其从11个小时以上移动到仅35分钟。见下文！

declare @UserId uniqueidentifier;
set @UserId = '936DA01F-9ABD-4d9d-80C7-02AF85C822A8';


select
    temp.Address_Line1,
    temp.Cell_Phone_Number,
    temp.City,
    temp.CPM_delt_acd,
    temp.CPM_delt_date,
    temp.Customer_Id,
    temp.Customer_Type,
    temp.Date_Birth,
    temp.Email_Business,
    temp.Email_Home,
    temp.First_Name,
    temp.Geo,
    temp.Home_Phone_Number,
    temp.Last_Name,
    temp.Link_Customer_Id,
    temp.Middle_Name,
    temp.Naics_Code,
    temp.Office_Phone_Number,
    temp.St,
    temp.Suffix,
    temp.Tin,
    temp.TIN_Indicator,
    temp.Zip_Code,

    crm_c.contactid as CrmRecordId, 
    crm_c.ownerid as OldOwnerId, 
    crm_c.ext_profiletype as old_profileType,
    coalesce(crm_fim.ownerid, @UserId) as OwnerId,
    2 as profileType,

    case 
        when
            (temp.Tin = crm_c.ext_retail_prime_taxid collate database_default 
            and temp.Last_Name = crm_c.lastname collate database_default)
        then
            ('Tin/LastName: '+temp.Tin + '/' + temp.Last_Name)
        when
            (temp.Customer_ID = crm_c.ext_customerid collate database_default)
        then
            ('Customer_ID: '+temp.Customer_ID)
        else
            ('New Customer: '+temp.Customer_ID)
    end as FriendlyName,

    case 
        when
            (temp.Customer_ID = crm_c.ext_customerid collate database_default)
        then
            0
        else
            1
    end as ForceFieldLock

from DailyProfile_Current temp

left join crm_contact crm_c 
    on (temp.Customer_ID = crm_c.ext_customerid collate database_default 
        or (temp.Tin = crm_c.ext_retail_prime_taxid collate database_default 
        and temp.Last_Name = crm_c.lastname collate database_default))
    and 0 = crm_c.deletionstatecode and 0 = crm_c.statecode    

left outer join crm_ext_ImportMapping crm_fim 
    on temp.Geo = crm_fim.ext_geocode collate database_default 
    and 0 = crm_fim.deletionstatecode and 0 = crm_fim.statecode

其中crm_contact是指向另一个数据库中的视图的同义词。该视图从联系表和contactextension表中提取数据。我需要来自两者的数据。如果有必要，我可以把它分成两个连接。通常，以“ext_”开头的列来自crm_contact视图的扩展部分。

当我在DailyProfile_Current表中对100k行运行时，大约需要35分钟。该表是一堆nvarchar（200）列，其中包含一个平面文件。它很糟糕，但这是我继承的。我想知道使用真实的数据类型是否会有所帮助，但我想要的是可能的解决方案，也不会涉及到这一点。

如果DailyProfile_Current表中包含与连接条件不匹配的内容，则运行速度非常快。如果表中充满了与连接条件匹配的东西，那就非常慢了。

临时表中有Customer_ID和Geo的索引。 crm_contact表上还有各种索引。不过，我不知道索引对nvarchar（200）列有多大帮助。

如果重要，我正在使用Sql Server 2005。

任何想法都表示赞赏。

Answer 1

我肯定会把它分成2个查询，因为或者函数有时会很慢。此外，在这些列上放置一个非聚集索引（按行分组）：

DailyProfile_Current:
Customer_ID 
Tin, Last_Name
Geo 

crm_contact:
ext_customerid,deletionstatecode,statecode
ext_retail_prime_taxid, lastname ,deletionstatecode,statecode

crm_ext_ImportMapping:
ext_geocode,deletionstatecode,statecode

Answer 2

为什么不尝试通过Query Profiler运行它？它可能会给你一些提示或者在查询结果中包含执行计划并查看它。

从查询的角度来看，我只能通过从OR子句移动JOIN并使用UNION ALL来合并结果来建议将其拆分为两个。至少，它可能会让你知道两种类型的JOIN中的哪一种是慢的，并从那里开始工作。

Answer 3

在查询分析器中运行它并允许它为您创建索引。我猜你至少有sql 2000.为什么不打破代码中的一些功能。例如，您可以在代码中执行case语句。但这是假设您正在编写代码查询。我发现拆分查询并占用代码中的一些负载会在运行时产生显着差异。

如何针对时间性能优化此查询？

3 个答案: