Question

我写了这个非常简单的查询，现在已经运行了16个多小时。我已经尝试了各种方法来优化它，但不知道如何进一步改进它。请帮忙。

select 
    a.*, b.TM_IN_XIT 
into
    scratch.dbo.ab_peak2 
from 
    scratch.dbo.ab_peak1 a 
left join 
    Scratch.dbo.ab_tnt b on a.DC_ZIP between b.ORIG_ZIP_low 
                         and b.ORIG_ZIP_high 
                         and a.Destination between b.DEST_ZIP_low and b.DEST_ZIP_high  
                         and a.Carr_Mode = b.Mode;

表scratch.dbo.ab_peak1如下所示，有~700万条记录：

+-----------------------------------------------+
| ShipmentNumber  DC_ZIP  Destination Carr_Mode |
+-----------------------------------------------+
| 252838748       60622       10016      A      |
| 252731857       60622       40517      A      |
| 252685087       60622       91601      B      |
| 252574905       60622       7017       B      |
| 252877256       60622       97230      A      |
| 254791362       20166       54971      B      |
| 255866277       60622       19131      A      |
| 255728088       60622       27713      B      |
| 255614555       60622       10009      A      |
| 255823071       60622       33556      B      |
+-----------------------------------------------+

表Scratch.dbo.ab_tnt如下所示，有~1.5万条记录：

+-----------------------------------------------------------------------------------+
| Mode    ORIG_ZIP_low    ORIG_ZIP_high   DEST_ZIP_low    DEST_ZIP_high   TM_IN_XIT |
+-----------------------------------------------------------------------------------+
|   A        41042            41042          62556            62556          2      |
|   B        41042            41042          62556            62556          3      |
|   A        41042            41042          62557            62557          1      |
|   B        41042            41042          62557            62557          2      |
|   A        41042            41042          62558            62563          2      |
|   B        41042            41042          62558            62563          3      |
|   A        41042            41042          62565            62567          1      |
|   B        41042            41042          62565            62567          2      |
|   A        41042            41042          62568            62570          2      |
|   B        41042            41042          62568            62570          3      |
+-----------------------------------------------------------------------------------+

我想要实现的目标 - “a”是发货表，“b”是具有所有起始 - 目的地组合的运输时间的表。 “b”表的结构基于如上所示的zip范围。我试图通过查看“b”表来为每次发货的“a”表带来运输时间。

我已经尝试过了：

TNT表中保留的zipcodes＆gt; = min（PEAK1表中的zipcodes）和＆lt; = max（PEAK1表中的zipcodes）
在TNT表的所有列上创建索引。

还有其他建议吗？

Answer 1

对于此查询：

select a.*, b.TM_IN_XIT 
into scratch.dbo.ab_peak2 
from scratch.dbo.ab_peak1 a left join
     Scratch.dbo.ab_tnt b
     on a.DC_ZIP between b.ORIG_ZIP_low and b.ORIG_ZIP_high and
        a.Destination between b.DEST_ZIP_low and b.DEST_ZIP_high and
        a.Carr_Mode = b.Mode;

您首先要尝试索引。值得注意的是ab_tnt(mode, orig_zip_low, orig_zip_high, dest_zip_low, dest_zip_high, tm_in_xit)。

我可能也倾向于用这种方式编写查询：

select a.*,
       (case when a.DC_ZIP <= b.ORIG_ZIP_high and a.destination <= b.DESC_ZIP_high
             then b.TM_IN_XIT
        end) as TM_IN_XIT
into scratch.dbo.ab_peak2 
from scratch.dbo.ab_peak1 a outer apply
     (select top 1 b.*
      from Scratch.dbo.ab_tnt b
      where a.DC_ZIP >= b.ORIG_ZIP_low and
            a.Destination >= b.DEST_ZIP_low and
            a.Carr_Mode = b.Mode
      order by b.ORIG_ZIP_low, b.DEST_ZIP_low
     ) b;

现在，这不是完全相同的查询。它返回第一个可能匹配的邮政编码。这个想法是子查询可以在ab_tnt(mode, b.ORIG_ZIP_low, b.DEST_ZIP_low)上正确使用索引。

然后case语句确定是否真的匹配。

我在一个维度上非常成功地使用了这个逻辑（例如，处理IP范围）。我没有将它用于两个维度，但如果您当前的查询已经运行了一天中的大部分时间，那么值得尝试。

注意：您可以使用(select top 1000 * from scratch.dbo.ab_peak1) a而不是表来运行记录子集来测试性能。

Answer 2

只是一个想法，但您是否尝试在ab_tnt中枚举所有可能的组合，并使用结果加入ab_peak1？下面是使用公用表表达式的示例，但临时表可能更好。此外，这个答案假设您有一个Integers表。

;with dest as (
    select dest_zip = i.I, tnt.Mode, orig_zip_low, orig_zip_high, tm_in_xit
    from Integers i
         join ab_tnt tnt on i.I between tnt.dest_zip_low and tnt.dest_zip_high
)
, tnt as (
    select dest.Mode, dest_zip, orig_zip = i.I, dest.tm_in_xit
    from Integers i
         join dest on i.I between dest.orig_zip_low and dest.orig_zip_high

)
select * 
from ab_peak1
    join tnt 
         on ab_peak1.DC_ZIP = tnt.orig_zip
        and ab_peak1.Destination = tnt.dest_zip
        and ab_peak1.Carr_Mode = tnt.Mode

需要建议进一步优化此SQL Server查询

2 个答案: