替代ROW_NUMBER()OVER(由x ORDER BY分区)

时间:2014-12-05 15:09:36

标签: sql sql-server tsql sql-server-2008-r2

关于ROW_NUMBER() OVER (PARTITION BY XXX ORDER BY YYY)表现不佳的问题,这里有很多问题,但我无法对我的情景应用任何答案。

这可能是我缺乏理解,或者我的情况不适合这种方法&应该采用完全不同的方法。

我处理大量数据(几GB),表格细分如下:

  • tbl_SynergyCache_PUR_ORD :180,046(采购订单标题(PO' s))
  • tbl_SynergyCache_PUR_LINE :1,746,629(采购订单行)
  • tbl_SynergyCache_SOH_HEAD :2,345,347(销售订单标题(发票))
  • tbl_SynergyCache_SOL_LINE :35,615,535(发票行)

我的要求是每个产品获取1行(PRO_CODE),每个产品的值如下:

  • 产品代码(PRO_CODE)
  • 最新发票日期(SOH_DATE_ENTER)
  • 最新发票编号(DOC_NUMBER)
  • 最新发票帐号否(SLS_CODE)
  • 最新发票价格(SOL_PRI_TRA)
  • 最新采购订单日期(PUR_DATE)
  • 最新采购订单编号(DOC_NUMBER)
  • 最新的PO帐号(PUR_CODE)
  • 最新采购订单价格(PUR_PRI_TRA)

在我自己给出它之前,我不是一个要求答案的人,所以我使用ROW_NUMBER()给了它一个但是报告花了半个多小时来运行(不夸张)

我的表非常简单:

CREATE TABLE [dbo].[tbl_SynergyCache_PUR_LINE] ( [DEPOT_CODE] [char] (10) NULL, [DOC_ID] [char] (2) NULL, [DOC_NUMBER] [char] (10) NULL, [PRO_CODE] [char] (50) NULL, [PUR_PRI_TRA] [float] NULL )
CREATE TABLE [dbo].[tbl_SynergyCache_PUR_ORD]  ( [DEPOT_CODE] [char] (10) NULL, [DOC_ID] [char] (2) NULL, [DOC_NUMBER] [char] (10) NULL, [PUR_CODE] [char] (20) NULL, [PUR_DATE] [datetime] NULL )
CREATE TABLE [dbo].[tbl_SynergyCache_SOH_HEAD] ( [DEPOT_CODE] [char] (10) NULL, [DOC_ID] [char] (2) NULL, [DOC_NUMBER] [char] (10) NULL, [SLS_CODE] [char] (20) NULL, [SOH_DATE_ENTER] [datetime] NULL )
CREATE TABLE [dbo].[tbl_SynergyCache_SOL_LINE] ( [DEPOT_CODE] [char] (10) NULL, [DOC_ID] [char] (2) NULL, [DOC_NUMBER] [char] (10) NULL, [PRO_CODE] [char] (50) NULL, [SOL_PRI_TRA] [float] NULL )

我创建了聚簇索引来加速连接,如下所示:

CREATE CLUSTERED INDEX [IX_tbl_SynergyCache_PUR_LINE] ON [dbo].[tbl_SynergyCache_PUR_LINE] ( [DEPOT_CODE] ASC, [DOC_ID] ASC, [DOC_NUMBER] ASC )
CREATE CLUSTERED INDEX [IX_tbl_SynergyCache_PUR_ORD]  ON [dbo].[tbl_SynergyCache_PUR_ORD]  ( [DEPOT_CODE] ASC, [DOC_ID] ASC, [DOC_NUMBER] ASC )
CREATE CLUSTERED INDEX [IX_tbl_SynergyCache_SOH_HEAD] ON [dbo].[tbl_SynergyCache_SOH_HEAD] ( [DEPOT_CODE] ASC, [DOC_ID] ASC, [DOC_NUMBER] ASC )
CREATE CLUSTERED INDEX [IX_tbl_SynergyCache_SOL_LINE] ON [dbo].[tbl_SynergyCache_SOL_LINE] ( [DEPOT_CODE] ASC, [DOC_ID] ASC, [DOC_NUMBER] ASC )

这是我的第一次尝试:

WITH    RawSalesData
          AS ( SELECT   Orders.DOC_NUMBER ,
                        Orders.SLS_CODE ,
                        Orders.SOH_DATE_ENTER ,
                        Lines.PRO_CODE ,
                        Lines.SOL_PRI_TRA ,
                        ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE ORDER BY Orders.SOH_DATE_ENTER DESC ) AS SaleRank
               FROM     tbl_SynergyCache_SOH_HEAD AS Orders WITH ( NOLOCK )
                        INNER JOIN tbl_SynergyCache_SOL_LINE AS Lines WITH ( NOLOCK ) ON Lines.DEPOT_CODE = Orders.DEPOT_CODE
                                                                                         AND Lines.DOC_ID = Orders.DOC_ID
                                                                                         AND Lines.DOC_NUMBER = Orders.DOC_NUMBER
             ),
        RawPurchaseData
          AS ( SELECT   Orders.DOC_NUMBER ,
                        Orders.PUR_CODE ,
                        Orders.PUR_DATE ,
                        Lines.PRO_CODE ,
                        Lines.PUR_PRI_TRA ,
                        ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE ORDER BY Orders.PUR_DATE DESC ) AS PurchaseRank
               FROM     tbl_SynergyCache_PUR_ORD AS Orders WITH ( NOLOCK )
                        INNER JOIN tbl_SynergyCache_PUR_LINE AS Lines WITH ( NOLOCK ) ON Lines.DEPOT_CODE = Orders.DEPOT_CODE
                                                                                         AND Lines.DOC_ID = Orders.DOC_ID
                                                                                         AND Lines.DOC_NUMBER = Orders.DOC_NUMBER
             ),
        ActiveProducts_All
          AS ( SELECT   PRO_CODE
               FROM     RawSalesData
               UNION ALL
               SELECT   PRO_CODE
               FROM     RawPurchaseData
             ),
        ActiveProducts
          AS ( SELECT DISTINCT
                        PRO_CODE
               FROM     ActiveProducts_All
               WHERE    LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''
             )
    SELECT  ActiveProducts.PRO_CODE AS ProductCode,
            RawSalesData.SOH_DATE_ENTER AS LastSold,
            RawSalesData.DOC_NUMBER AS LastSold_OrderNo,
            RawSalesData.SLS_CODE AS LastSold_AccountNo,
            RawSalesData.SOL_PRI_TRA AS LastSold_Price,
            RawPurchaseData.PUR_DATE AS LastPurchased,
            RawPurchaseData.DOC_NUMBER AS LastPurchased_PurchaseOrderNo,
            RawPurchaseData.PUR_CODE AS LastPurchased_AccountNo,
            RawPurchaseData.PUR_PRI_TRA AS LastPurchased_Price
    FROM    ActiveProducts
            LEFT OUTER JOIN RawSalesData ON RawSalesData.PRO_CODE = ActiveProducts.PRO_CODE
                                            AND RawSalesData.SaleRank = 1
            LEFT OUTER JOIN RawPurchaseData ON RawPurchaseData.PRO_CODE = ActiveProducts.PRO_CODE
                                               AND RawPurchaseData.PurchaseRank = 1
    ORDER BY ActiveProducts.PRO_CODE

所以我的问题是:如何使这个查询有效地工作?如果我用ROW_NUMBER()鞭打死马,那么满足要求的替代方案是什么?

为了让您了解规模,虽然这里有4000万行源数据,但只有33,000个独特的产品,因此与输入相比,此查询的输出相对较小。

编辑:我30分钟以上的执行计划看起来没问题(即没有全表扫描),但我感觉我让SQL做的工作超出了必要的范围:

  |--Parallelism(Gather Streams, ORDER BY:([Union1014] ASC))
       |--Merge Join(Right Outer Join, MANY-TO-MANY MERGE:([Lines].[PRO_CODE])=([Union1014]), RESIDUAL:([Union1014]=[dbo].[tbl_SynergyCache_PUR_LINE].[PRO_CODE] as [Lines].[PRO_CODE]))
            |--Filter(WHERE:([Expr1034]=(1)))
            |    |--Compute Scalar(DEFINE:([Expr1035]=[dbo].[tbl_SynergyCache_PUR_ORD].[DOC_NUMBER] as [Orders].[DOC_NUMBER], [Expr1036]=[dbo].[tbl_SynergyCache_PUR_ORD].[PUR_CODE] as [Orders].[PUR_CODE], [Expr1037]=[dbo].[tbl_SynergyCache_PUR_ORD].[PUR_DATE] as [Orders].[PUR_DATE], [Expr1039]=[dbo].[tbl_SynergyCache_PUR_LINE].[PUR_PRI_TRA] as [Lines].[PUR_PRI_TRA]))
            |         |--Parallelism(Distribute Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[PRO_CODE]))
            |              |--Sequence Project(DEFINE:([Expr1034]=row_number))
            |                   |--Segment
            |                        |--Parallelism(Gather Streams, ORDER BY:([Lines].[PRO_CODE] ASC, [Orders].[PUR_DATE] DESC))
            |                             |--Sort(ORDER BY:([Lines].[PRO_CODE] ASC, [Orders].[PUR_DATE] DESC))
            |                                  |--Hash Match(Inner Join, HASH:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER])=([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]), RESIDUAL:([dbo].[tbl_SynergyCache_PUR_LINE].[DEPOT_CODE] as [Lines].[DEPOT_CODE]=[dbo].[tbl_SynergyCache_PUR_ORD].[DEPOT_CODE] as [Orders].[DEPOT_CODE] AND [dbo].[tbl_SynergyCache_PUR_LINE].[DOC_ID] as [Lines].[DOC_ID]=[dbo].[tbl_SynergyCache_PUR_ORD].[DOC_ID] as [Orders].[DOC_ID] AND [dbo].[tbl_SynergyCache_PUR_LINE].[DOC_NUMBER] as [Lines].[DOC_NUMBER]=[dbo].[tbl_SynergyCache_PUR_ORD].[DOC_NUMBER] as [Orders].[DOC_NUMBER]))
            |                                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER]))
            |                                       |    |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_PUR_ORD].[IX_tbl_SynergyCache_PUR_ORD] AS [Orders]))
            |                                       |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]))
            |                                            |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_PUR_LINE].[IX_tbl_SynergyCache_PUR_LINE] AS [Lines]))
            |--Merge Join(Left Outer Join, MERGE:([Union1014])=([Lines].[PRO_CODE]), RESIDUAL:([Union1014]=[dbo].[tbl_SynergyCache_SOL_LINE].[PRO_CODE] as [Lines].[PRO_CODE]))
                 |--Sort(ORDER BY:([Union1014] ASC))
                 |    |--Hash Match(Aggregate, HASH:([Union1014]), RESIDUAL:([Union1014] = [Union1014]))
                 |         |--Concatenation
                 |              |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[PRO_CODE]))
                 |              |    |--Hash Match(Aggregate, HASH:([Lines].[PRO_CODE]), RESIDUAL:([dbo].[tbl_SynergyCache_SOL_LINE].[PRO_CODE] as [Lines].[PRO_CODE] = [dbo].[tbl_SynergyCache_SOL_LINE].[PRO_CODE] as [Lines].[PRO_CODE]))
                 |              |         |--Hash Match(Inner Join, HASH:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER])=([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]), RESIDUAL:([dbo].[tbl_SynergyCache_SOL_LINE].[DEPOT_CODE] as [Lines].[DEPOT_CODE]=[dbo].[tbl_SynergyCache_SOH_HEAD].[DEPOT_CODE] as [Orders].[DEPOT_CODE] AND [dbo].[tbl_SynergyCache_SOL_LINE].[DOC_ID] as [Lines].[DOC_ID]=[dbo].[tbl_SynergyCache_SOH_HEAD].[DOC_ID] as [Orders].[DOC_ID] AND [dbo].[tbl_SynergyCache_SOL_LINE].[DOC_NUMBER] as [Lines].[DOC_NUMBER]=[dbo].[tbl_SynergyCache_SOH_HEAD].[DOC_NUMBER] as [Orders].[DOC_NUMBER]))
                 |              |              |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER]))
                 |              |              |    |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_SOH_HEAD].[IX_tbl_SynergyCache_SOH_HEAD] AS [Orders]))
                 |              |              |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]))
                 |              |                   |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_SOL_LINE].[IX_tbl_SynergyCache_SOL_LINE] AS [Lines]), WHERE:(ltrim(rtrim(isnull([dbo].[tbl_SynergyCache_SOL_LINE].[PRO_CODE] as [Lines].[PRO_CODE],'                                                  ')))<>''))
                 |              |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[PRO_CODE]))
                 |                   |--Hash Match(Aggregate, HASH:([Lines].[PRO_CODE]), RESIDUAL:([dbo].[tbl_SynergyCache_PUR_LINE].[PRO_CODE] as [Lines].[PRO_CODE] = [dbo].[tbl_SynergyCache_PUR_LINE].[PRO_CODE] as [Lines].[PRO_CODE]))
                 |                        |--Hash Match(Inner Join, HASH:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER])=([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]), RESIDUAL:([dbo].[tbl_SynergyCache_PUR_LINE].[DEPOT_CODE] as [Lines].[DEPOT_CODE]=[dbo].[tbl_SynergyCache_PUR_ORD].[DEPOT_CODE] as [Orders].[DEPOT_CODE] AND [dbo].[tbl_SynergyCache_PUR_LINE].[DOC_ID] as [Lines].[DOC_ID]=[dbo].[tbl_SynergyCache_PUR_ORD].[DOC_ID] as [Orders].[DOC_ID] AND [dbo].[tbl_SynergyCache_PUR_LINE].[DOC_NUMBER] as [Lines].[DOC_NUMBER]=[dbo].[tbl_SynergyCache_PUR_ORD].[DOC_NUMBER] as [Orders].[DOC_NUMBER]))
                 |                             |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER]))
                 |                             |    |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_PUR_ORD].[IX_tbl_SynergyCache_PUR_ORD] AS [Orders]))
                 |                             |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]))
                 |                                  |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_PUR_LINE].[IX_tbl_SynergyCache_PUR_LINE] AS [Lines]), WHERE:(ltrim(rtrim(isnull([dbo].[tbl_SynergyCache_PUR_LINE].[PRO_CODE] as [Lines].[PRO_CODE],'                                                  ')))<>''))
                 |--Filter(WHERE:([Expr1021]=(1)))
                      |--Compute Scalar(DEFINE:([Expr1022]=[dbo].[tbl_SynergyCache_SOH_HEAD].[DOC_NUMBER] as [Orders].[DOC_NUMBER], [Expr1023]=[dbo].[tbl_SynergyCache_SOH_HEAD].[SLS_CODE] as [Orders].[SLS_CODE], [Expr1024]=[dbo].[tbl_SynergyCache_SOH_HEAD].[SOH_DATE_ENTER] as [Orders].[SOH_DATE_ENTER], [Expr1026]=[dbo].[tbl_SynergyCache_SOL_LINE].[SOL_PRI_TRA] as [Lines].[SOL_PRI_TRA]))
                           |--Parallelism(Distribute Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[PRO_CODE]))
                                |--Sequence Project(DEFINE:([Expr1021]=row_number))
                                     |--Segment
                                          |--Parallelism(Gather Streams, ORDER BY:([Lines].[PRO_CODE] ASC, [Orders].[SOH_DATE_ENTER] DESC))
                                               |--Sort(ORDER BY:([Lines].[PRO_CODE] ASC, [Orders].[SOH_DATE_ENTER] DESC))
                                                    |--Hash Match(Inner Join, HASH:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER])=([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]), RESIDUAL:([dbo].[tbl_SynergyCache_SOL_LINE].[DEPOT_CODE] as [Lines].[DEPOT_CODE]=[dbo].[tbl_SynergyCache_SOH_HEAD].[DEPOT_CODE] as [Orders].[DEPOT_CODE] AND [dbo].[tbl_SynergyCache_SOL_LINE].[DOC_ID] as [Lines].[DOC_ID]=[dbo].[tbl_SynergyCache_SOH_HEAD].[DOC_ID] as [Orders].[DOC_ID] AND [dbo].[tbl_SynergyCache_SOL_LINE].[DOC_NUMBER] as [Lines].[DOC_NUMBER]=[dbo].[tbl_SynergyCache_SOH_HEAD].[DOC_NUMBER] as [Orders].[DOC_NUMBER]))
                                                         |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Orders].[DEPOT_CODE], [Orders].[DOC_ID], [Orders].[DOC_NUMBER]))
                                                         |    |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_SOH_HEAD].[IX_tbl_SynergyCache_SOH_HEAD] AS [Orders]))
                                                         |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([Lines].[DEPOT_CODE], [Lines].[DOC_ID], [Lines].[DOC_NUMBER]))
                                                              |--Clustered Index Scan(OBJECT:([dbo].[tbl_SynergyCache_SOL_LINE].[IX_tbl_SynergyCache_SOL_LINE] AS [Lines]))

SQL小提琴1 PO&amp; 1发票1 2 PRO_CODES:

http://sqlfiddle.com/#!3/557b6/1/0

BLAM&amp ;;提出的最终解决方案蓝色的建议:

我基本上将CTE分解为多个索引临时表,这使得输出在6分钟内反而不是30分钟。

CREATE TABLE #tbl_RawSalesData
    (
      DOC_NUMBER CHAR(10) ,
      SLS_CODE CHAR(20) ,
      SOH_DATE_ENTER DATETIME ,
      PRO_CODE CHAR(50) PRIMARY KEY ,
      SOL_PRI_TRA FLOAT
    )

INSERT  INTO #tbl_RawSalesData
        ( DOC_NUMBER ,
          SLS_CODE ,
          SOH_DATE_ENTER ,
          PRO_CODE ,
          SOL_PRI_TRA
        )
        SELECT  DOC_NUMBER ,
                SLS_CODE ,
                SOH_DATE_ENTER ,
                PRO_CODE ,
                SOL_PRI_TRA
        FROM    ( SELECT    Orders.DOC_NUMBER ,
                            Orders.SLS_CODE ,
                            Orders.SOH_DATE_ENTER ,
                            Lines.PRO_CODE ,
                            Lines.SOL_PRI_TRA ,
                            ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE ORDER BY Orders.SOH_DATE_ENTER DESC ) AS SaleRank
                  FROM      tbl_SynergyCache_SOH_HEAD AS Orders WITH ( NOLOCK )
                            JOIN tbl_SynergyCache_SOL_LINE AS Lines WITH ( NOLOCK ) ON Lines.DEPOT_CODE = Orders.DEPOT_CODE
                                                                                       AND Lines.DOC_ID = Orders.DOC_ID
                                                                                       AND Lines.DOC_NUMBER = Orders.DOC_NUMBER
                ) AS RawSalesData
        WHERE   SaleRank = 1
                AND LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''

CREATE TABLE #tbl_RawPurchaseData
    (
      DOC_NUMBER CHAR(10) ,
      PUR_CODE CHAR(20) ,
      PUR_DATE DATETIME ,
      PRO_CODE CHAR(50) PRIMARY KEY ,
      PUR_PRI_TRA FLOAT
    ) INSERT  INTO #tbl_RawPurchaseData
        ( DOC_NUMBER ,
          PUR_CODE ,
          PUR_DATE ,
          PRO_CODE ,
          PUR_PRI_TRA
        )
        SELECT  DOC_NUMBER ,
                PUR_CODE ,
                PUR_DATE ,
                PRO_CODE ,
                PUR_PRI_TRA
        FROM    ( SELECT    Orders.DOC_NUMBER ,
                            Orders.PUR_CODE ,
                            Orders.PUR_DATE ,
                            Lines.PRO_CODE ,
                            Lines.PUR_PRI_TRA ,
                            ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE ORDER BY Orders.PUR_DATE DESC ) AS PurchaseRank
                  FROM      tbl_SynergyCache_PUR_ORD AS Orders WITH ( NOLOCK )
                            INNER JOIN tbl_SynergyCache_PUR_LINE AS Lines WITH ( NOLOCK ) ON Lines.DEPOT_CODE = Orders.DEPOT_CODE
                                                                                             AND Lines.DOC_ID = Orders.DOC_ID
                                                                                             AND Lines.DOC_NUMBER = Orders.DOC_NUMBER
                ) AS RawPurchaseData
        WHERE   PurchaseRank = 1
                AND LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''

CREATE TABLE #tbl_ActiveProducts
    (
      PRO_CODE CHAR(50) PRIMARY KEY
    ) INSERT  INTO #tbl_ActiveProducts
        ( PRO_CODE
        )
        SELECT  PRO_CODE
        FROM    #tbl_RawSalesData
        UNION
        SELECT  PRO_CODE
        FROM    #tbl_RawPurchaseData

SELECT  A.PRO_CODE ,
        S.SOH_DATE_ENTER ,
        S.DOC_NUMBER ,
        S.SLS_CODE ,
        S.SOL_PRI_TRA ,
        P.PUR_DATE ,
        P.DOC_NUMBER ,
        P.PUR_CODE ,
        P.PUR_PRI_TRA FROM    #tbl_ActiveProducts AS A
        LEFT OUTER JOIN #tbl_RawPurchaseData AS P ON P.PRO_CODE = A.PRO_CODE
        LEFT OUTER JOIN #tbl_RawSalesData AS S ON S.PRO_CODE = A.PRO_CODE

DROP TABLE #tbl_ActiveProducts DROP TABLE #tbl_RawPurchaseData DROP TABLE #tbl_RawSalesData

2 个答案:

答案 0 :(得分:3)

查询基本正确。
你的问题是CTE CTE只是语法 对这些CTE进行评估,并且很可能不止一次评估。

第二个问题是加入两个条件通常效率不高。

将第1行(仅第1行和第1行)物化为#temp表,甚至声明PK

所以你的联系看起来应该更像这个

FROM #ActiveProducts
LEFT OUTER JOIN #RawSalesData 
  ON #RawSalesData.PRO_CODE = #ActiveProducts.PRO_CODE
LEFT OUTER JOIN #RawPurchaseData 
  ON #RawPurchaseData.PRO_CODE = #ActiveProducts.PRO_CODE

P.S。您可以使用CTE填充#temp,但不要在最终查询的连接中使用CTE。

你做了很多我认为不需要的东西

create table #RawSalesData  .. 
create table #RawPurchaseData ...    

WITH    RawSalesData
          AS ( SELECT   Orders.DOC_NUMBER ,
                        Orders.SLS_CODE ,
                        Orders.SOH_DATE_ENTER ,
                        Lines.PRO_CODE ,
                        Lines.SOL_PRI_TRA ,
                        ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE 
                                            ORDER BY     Orders.SOH_DATE_ENTER DESC ) AS SaleRank
               FROM     tbl_SynergyCache_SOH_HEAD AS Orders WITH ( NOLOCK )
               JOIN     tbl_SynergyCache_SOL_LINE AS Lines  WITH ( NOLOCK ) 
                 ON     Lines.DEPOT_CODE = Orders.DEPOT_CODE
                AND     Lines.DOC_ID = Orders.DOC_ID
                AND     Lines.DOC_NUMBER = Orders.DOC_NUMBER
             ),
        RawPurchaseData
          AS ( SELECT   Orders.DOC_NUMBER ,
                        Orders.PUR_CODE ,
                        Orders.PUR_DATE ,
                        Lines.PRO_CODE ,
                        Lines.PUR_PRI_TRA ,
                        ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE 
                                            ORDER BY     Orders.PUR_DATE DESC ) AS PurchaseRank
               FROM     tbl_SynergyCache_PUR_ORD AS Orders WITH ( NOLOCK )
               JOIN     tbl_SynergyCache_PUR_LINE AS Lines WITH ( NOLOCK ) 
                 ON     Lines.DEPOT_CODE = Orders.DEPOT_CODE
                AND     Lines.DOC_ID = Orders.DOC_ID
                AND     Lines.DOC_NUMBER = Orders.DOC_NUMBER
             ),

insert into #RawSalesData  ...
select ... 
from   RawSalesData 
where  RawSalesData.SaleRank = 1 
and    LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''
order by RawSalesData.PRO_CODE;  -- index this column for an even faster join
insert into #RawPurchaseData  ...
select ... 
from   RawPurchaseData
where  RawPurchaseData.PurchaseRank = 1
and    LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''
order by RawPurchaseData.PRO_CODE;;

    SELECT  #RawSalesData.PRO_CODE AS ProductCode,
            #RawSalesData.SOH_DATE_ENTER AS LastSold,
            #RawSalesData.DOC_NUMBER AS LastSold_OrderNo,
            #RawSalesData.SLS_CODE AS LastSold_AccountNo,
            #RawSalesData.SOL_PRI_TRA AS LastSold_Price,
            #RawPurchaseData.PUR_DATE AS LastPurchased,
            #RawPurchaseData.DOC_NUMBER AS LastPurchased_PurchaseOrderNo,
            #RawPurchaseData.PUR_CODE AS LastPurchased_AccountNo,
            #RawPurchaseData.PUR_PRI_TRA AS LastPurchased_Price
    FROM    #RawSalesData  
    LEFT OUTER JOIN #RawPurchaseData  
      ON    #RawPurchaseData.PRO_CODE = #RawSalesData.PRO_CODE  
      --    now this join is trivial                                             
    ORDER BY #RawSalesData.PRO_CODE

修改后的答案的调整

CREATE TABLE #tbl_RawSalesData
    (
      DOC_NUMBER CHAR(10) ,
      SLS_CODE CHAR(20) ,
      SOH_DATE_ENTER DATETIME ,
      PRO_CODE CHAR(50) PRIMARY KEY ,
      SOL_PRI_TRA FLOAT
    )

INSERT  INTO #tbl_RawSalesData
        ( DOC_NUMBER ,
          SLS_CODE ,
          SOH_DATE_ENTER ,
          PRO_CODE ,
          SOL_PRI_TRA
        )
        SELECT  DOC_NUMBER ,
                SLS_CODE ,
                SOH_DATE_ENTER ,
                PRO_CODE ,
                SOL_PRI_TRA
        FROM    ( SELECT    Orders.DOC_NUMBER ,
                            Orders.SLS_CODE ,
                            Orders.SOH_DATE_ENTER ,
                            Lines.PRO_CODE ,
                            Lines.SOL_PRI_TRA ,
                            ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE ORDER BY Orders.SOH_DATE_ENTER DESC ) AS SaleRank
                  FROM      tbl_SynergyCache_SOH_HEAD AS Orders WITH ( NOLOCK )
                            JOIN tbl_SynergyCache_SOL_LINE AS Lines WITH ( NOLOCK ) ON Lines.DEPOT_CODE = Orders.DEPOT_CODE
                                                                                       AND Lines.DOC_ID = Orders.DOC_ID
                                                                                       AND Lines.DOC_NUMBER = Orders.DOC_NUMBER
                  Where     LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''
                ) AS RawSalesData
        WHERE   SaleRank = 1 
        ORDER BY PRO_CODE

CREATE TABLE #tbl_RawPurchaseData
    (
      DOC_NUMBER CHAR(10) ,
      PUR_CODE CHAR(20) ,
      PUR_DATE DATETIME ,
      PRO_CODE CHAR(50) PRIMARY KEY ,
      PUR_PRI_TRA FLOAT
    ) INSERT  INTO #tbl_RawPurchaseData
        ( DOC_NUMBER ,
          PUR_CODE ,
          PUR_DATE ,
          PRO_CODE ,
          PUR_PRI_TRA
        )
        SELECT  DOC_NUMBER ,
                PUR_CODE ,
                PUR_DATE ,
                PRO_CODE ,
                PUR_PRI_TRA
        FROM    ( SELECT    Orders.DOC_NUMBER ,
                            Orders.PUR_CODE ,
                            Orders.PUR_DATE ,
                            Lines.PRO_CODE ,
                            Lines.PUR_PRI_TRA ,
                            ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE ORDER BY Orders.PUR_DATE DESC ) AS PurchaseRank
                  FROM      tbl_SynergyCache_PUR_ORD AS Orders WITH ( NOLOCK )
                            INNER JOIN tbl_SynergyCache_PUR_LINE AS Lines WITH ( NOLOCK ) ON Lines.DEPOT_CODE = Orders.DEPOT_CODE
                                                                                            AND Lines.DOC_ID = Orders.DOC_ID
                                                                                            AND Lines.DOC_NUMBER = Orders.DOC_NUMBER
                  Where     LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''
                ) AS RawPurchaseData
        WHERE   PurchaseRank = 1
        ORDER BY PRO_CODE


SELECT  isnull (P.PRO_CODE, S.PRO_CODE)
        S.SOH_DATE_ENTER ,
        S.DOC_NUMBER ,
        S.SLS_CODE ,
        S.SOL_PRI_TRA ,
        P.PUR_DATE ,
        P.DOC_NUMBER ,
        P.PUR_CODE ,        
        FROM            #tbl_RawPurchaseData AS P 
        FULL OUTER JOIN #tbl_RawSalesData    AS S ON S.PRO_CODE = P.PRO_CODE

DROP TABLE #tbl_ActiveProducts DROP TABLE #tbl_RawPurchaseData DROP TABLE #tbl_RawSalesData

答案 1 :(得分:0)

一个概念性问题,无论其他什么:你是从最大的表中加入等,为ROW_NUMBER(即PARTITION BY部分)等进行排序,然后然后对它进行过滤?您在子包中有以下内容:

  ActiveProducts
      AS ( SELECT DISTINCT
                    PRO_CODE
           FROM     ActiveProducts_All
           WHERE    LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''
         )

此WHERE条件需要应用于两个基础CTE:RawSalesDataRawPurchaseData,然后从此CTE中删除WHERE。

RawSalesData
      AS ( SELECT   Orders.DOC_NUMBER ,
                    Orders.SLS_CODE ,
                    Orders.SOH_DATE_ENTER ,
                    Lines.PRO_CODE ,
                    Lines.SOL_PRI_TRA ,
                    ROW_NUMBER() OVER ( PARTITION BY Lines.PRO_CODE
                                   ORDER BY Orders.SOH_DATE_ENTER DESC ) AS SaleRank
           FROM     tbl_SynergyCache_SOH_HEAD AS Orders WITH ( NOLOCK )
           INNER JOIN tbl_SynergyCache_SOL_LINE AS Lines WITH ( NOLOCK )
                   ON Lines.DEPOT_CODE = Orders.DEPOT_CODE
                  AND Lines.DOC_ID = Orders.DOC_ID
                  AND Lines.DOC_NUMBER = Orders.DOC_NUMBER
           WHERE    LTRIM(RTRIM(ISNULL(PRO_CODE, ''))) <> ''
         )
 ....
 ActiveProducts
     AS ( SELECT DISTINCT PRO_CODE
          FROM     ActiveProducts_All
        )

此外,无论我建议的任何更改,或者您从@ Blam的答案中实施的任何更改,您都可以做到以下几点:

  • 在表上定义PK和FK(如果3个关键字段都没有NULL;它们被定义为NULL但是并不意味着它们包含NULL,但你必须改变列以及NOT NULL):
    • 即使源是第三方,了解它们之间的关系也有助于优化器。
    • 还知道Key的值(即使只是一个CLUSTERED INDEX而不是PK)是唯一的也有助于优化器。因此,即使您不想做PK(或者如果由于这3个Key字段中存在一些NULL值而无法做到),至少将CLUSTERED INDEX重新定义为UNIQUE(on两个标题表)
  • 创建非聚集索引以支持对PARTITION BY和ORDER BY操作进行排序:
    • tbl_SynergyCache_PUR_ORD.PUR_DATE DESC
    • tbl_SynergyCache_SOH_HEAD.SOH_DATE_ENTER DESC
  • 并且,在进行任何PK / FK更改之前,似乎有3个关键字段 - DEPOT_CODE,DOC_ID&amp; DOC_NUMBER - 仅用于JOIN而不用于排序。意思是,您正在加入三个不区分大小写的字符串字段(不好)。加入这些表的所有查询都将通过在Header表上添加INT NOT NULL IDENTITY字段和INT NOT NULL字段来保存Detail表中的匹配值。这些新的INT字段将是Header表中的CLUSTERED PK和CLUSTERED INDEX(可以是唯一的,因此不能成为PK)。