优化慢速复杂的远程SQL查询

时间:2014-07-08 19:48:43

标签: sql sql-server database performance query-optimization

我发现自己需要从远程数据库中检索平均约1.5米的匹配项。有两个表(ITEM1和ITEM2)具有日期项目信息。 ITEM1中的项目应始终至少有一条记录,ITEM2中的同一项目可能有0到多条记录。我必须从任一表中找到最新记录,如果它存在于ITEM2中,请使用该信息而不是ITEM1。 #TEMPA是具有初始~1.5m ItemNumbers的表。

以下是查询:

SELECT GETDATE() AS DateElement, A.SourceStore, COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN REMOTEDB.ITEM1 CO
    ON  CO.item_id = A.ItemNumber 
    AND CO.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM1 CO2 WHERE CO2.item_id = A.ItemNumber) 
LEFT JOIN REMOTEDB.ITEM2 FR
    ON  FR.item_id = A.ItemNumber 
    AND FR.month_ending >= (SELECT MAX(month_ending) FROM REMOTEDB.ITEM2 FR2 WHERE FR2.item_id = A.ItemNumber)
WHERE CO.item_id IS NOT NULL 
    OR FR.item_id IS NOT NULL

两个ITEM表上的item_id和month_ending都有唯一的聚簇索引。我意识到子查询可能是一个很大的性能影响,但我想不出有任何其他方法可以做到这一点。每个项目可能具有不同的最大month_ending日期。目前它返回正确的信息,但这需要大约2.6小时。任何有关优化此查询以更好地执行的帮助都将受到赞赏。

编辑:我应该提一下,查询也正在运行READ UNCOMMITTED。

我使用ROW_NUMBER尝试了两个回答查询,并且它们都在远程服务器本身上运行了大约20分钟。使用我的原始查询它在约2分钟内完成。 我的原始查询在链接服务器上运行约17分钟。一旦他们超过一小时我就取消了其他查询。

思想?

回答查询: http://content.screencast.com/users/CWhittem/folders/Jing/media/ed55352b-9799-4dec-94f0-764e2670884f/2014-07-09_0957.png

原始查询: http://content.screencast.com/users/CWhittem/folders/Jing/media/4991aa7d-a05c-4fb1-afad-52b07f896d5e/2014-07-09_1014.png

谢谢!

3 个答案:

答案 0 :(得分:3)

使用MAX和ROW_NUMBERs重写相关子查询:

SELECT GETDATE() AS DateElement, A.SourceStore, 
   COALESCE(FR.original_cost,CO.original_cost) AS Cost
FROM #TEMPA A
INNER JOIN
  (
    SELECT *
    FROM
     (
      SELECT original_cost,
          item_id,
          ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
      FROM REMOTEDB.ITEM1 
     ) as dt
    WHERE rn = 1
  ) AS CO
 ON  CO.item_id = A.ItemNumber 
LEFT JOIN 
  (
    SELECT *
    FROM
     (
      SELECT original_cost,
          item_id,
          ROW_NUMBER() OVER (PARTITIOM BY item_id ORDER BY month_ending DESC) AS rn
      FROM REMOTEDB.ITEM2 
     ) as dt
    WHERE rn = 1
   ) as FR
    ON  FR.item_id = A.ItemNumber

答案 1 :(得分:1)

如果是SQL Server 2008或更高版本,请尝试此操作...

;With   OrderedItem1 As
(
        Select  Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
                item_id,
                original_cost
        From    REMOTEDB.ITEM1
),      OrderedItem2 As
(
        Select  Row_Number() Over (Partition By item_id Order By Month_Ending Desc) As recentOrderID,
                item_id,
                original_cost
        From    REMOTEDB.ITEM2
),      maxItem1 As
(
        Select  item_id,
                original_cost
        From    OrderedItem1
        Wher    recentOrderID = 1
),      maxItem2 As
(
        Select  item_id,
                original_cost
        From    OrderedItem2
        Wher    recentOrderID = 1
)
Select  GetDate() As DateElement,
        A.SourceStore,
        IsNull(FR.original_cost,CO.original_cost) As Cost
From    #TEMPA As A
Join    maxItem1 As CO
        On  CO.item_id = A.ItemNumber
Left    Join maxItem2 FR
        On  FR.item_id = A.ItemNumber

...你在原帖中提到ITEM1中的每个项目总会有一条记录,所以你的WHERE CO.item_id Is Not Null OR FR.item_id Is Not Null什么都不做(事实上你会用内连接过滤掉它们)。

答案 2 :(得分:0)

经过多次测试和实验后,我得出的结果优于我尝试过的其他内容:

SELECT DISTINCT oInv.Item_ID, oInv.Month_Ending, oInv.Original_Cost
FROM (
    SELECT Item_ID, Month_Ending, Original_Cost
    FROM ho_data.dbo.CO_Ho_Inven
    UNION ALL
    SELECT Item_ID, Month_Ending, Original_Cost
    FROM ho_data.dbo.FR_Ho_Inven
) OInv
INNER JOIN (
    SELECT UInv.Item_ID, MAX(UInv.Month_ending) AS Month_Ending, MAX(original_cost) AS original_cost
    FROM (
        SELECT Item_ID, Month_Ending, original_cost
        FROM ho_data.dbo.CO_Ho_Inven
        UNION ALL
        SELECT Item_ID, Month_Ending, original_cost
        FROM ho_data.dbo.FR_Ho_Inven
    ) UInv
    GROUP BY UInv.Item_ID
) UINv
ON OInv.Item_ID = UInv.Item_ID
AND OInv.Month_Ending = UInv.Month_Ending
AND OInv.original_cost = UINv.original_cost