将SQL表拆分为子表

时间:2017-09-28 10:06:18

标签: sql-server tsql

我有一个庞大的数据表需要以某种方式聚合。数据太大,无法在一次命中中完成,因此我首先将表拆分为N个子表,并在单独的块上执行聚合。执行拆分的代码(在下面拆分为3个单独的子表的情况下)是

SELECT [EpiSer], 
       [SINum], 
       [VolNum], 
       [CTPQty], 
       [VolAmt], 
       [CTPActivityGroupCode],  
       NTILE(3) OVER(ORDER BY Id) AS TilingIdx 
INTO [_Stage2] 
FROM [_Stage1];
GO

要创建第二个子表,我使用

SELECT [EpiSer], 
       [SINum], 
       [VolNum], 
       [CTPQty], 
       [VolAmt], 
       [CTPActivityGroupCode] 
INTO [_Stage2_Part2] 
FROM [_Stage2] 
WHERE [TilingIdx] = 2; -- This number is changed for each split 1, 2 and 3
GO

问题在于我在[EpiSer]上生成的每个子表组上使用的聚合查询(其中有重复项)。因此,以这种方式拆分,可以将具有相同[EpiSer]的记录拆分到不同的子表中,因此当我执行聚合时,我们缺少一些记录。作为参考,聚合查询[对于子表2]是

SELECT [s1].[EpiSer] as ActivityRecordID, 
       [s1].[CTPActivityGroupCode] as ActCstID, 
       [t].[ResCstID], 
       [s1].[VolAmt], 
       [s1].[CTPQty] AS ActCnt, 
       SUM([s1].[VolAmt] * [t].[OCostUnit]) AS TotOCst, 
       SUM([s1].[VolAmt] * [t].[FCostUnit]) AS TotFCst 
INTO [_Agg2] 
FROM [_Stage2_Part2] AS s1 
    INNER JOIN 
        [DriversCtp] AS t ON [s1].[VolNum] = [t].[VolNum] 
GROUP BY [s1].[EpiSer], 
         [s1].[CTPActivityGroupCode], 
         [t].[ResCstID], 
         [s1].[VolAmt], 
         [s1].[CTPQty];
GO

所以,我的问题是,如何将原始表拆分为N个子表,但是确保具有相同[EpiSer]的记录保存在相同的子表中?

感谢您的时间。

1 个答案:

答案 0 :(得分:2)

在分组数据后,您应该可以使用其他UPDATE来实现此目的。当您按ID订购时,我们可以找到每个ID的最小组:

DECLARE @DataSource TABLE
(
    [id] TINYINT PRIMARY KEY IDENTITY(1,1)
   ,[value] TINYINT
);

INSERT INTO @DataSource ([value])
VALUES (1), (1), (1), (2), (3), (4), (5), (6), (7), (7), (7), (7), (7), (7), (7), (7), (8), (9), (10), (11);


SELECT *
      ,NTILE(3) OVER(ORDER BY Id)  AS [GroupID]
INTO #DataSource
FROM @DataSource;


SELECT *
      ,MIN([GroupID]) OVER(PARTITION BY [value])
FROM #DataSource

DROP TABLE #DataSource;

enter image description here

WITH DataSource AS
(
    SELECT [id]
          ,MIN([GroupID]) OVER(PARTITION BY [value]) AS [GroupID]
    FROM #DataSource
)
UPDATE #DataSource
SET [GroupID] = B.[GroupID]
FROM #DataSource A
INNER JOIN DataSource B
    ON A.[id] = B.[id];

我不确定这会如何影响您的效果,但无法确定如何fix内联分组。

此外,如果您使用的是SQL Server 2012+,则可以查看clustered column store indexes - 它们可用于跨越大型表的optimizing聚合。