根据聚合创建大小均匀的组

时间:2013-07-26 19:50:23

标签: sql-server sql-server-2012

可能是一个新手问题,但我希望根据总数据库大小将我们的服务器库存分成几个大小相等的组,并且难以弄清楚如何对它们进行分组。我认为NTILE可能会起作用,但我无法将我的头围绕均匀地分开。我下面的例子就是随机排序服务器。我希望结果是3组相当均匀的大小(显然不准确)。

使用SQL Server 2012.感谢任何帮助。感谢。

declare @Servers table (ServerName sysname, TotalSizeGB decimal (12,2))
insert into @Servers values
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)

select GroupNumber, sum(TotalSizeGB) as TotalSizeGB
from (
     select ServerName, sum(TotalSizeGB) as TotalSizeGB, ntile(3) over (order by newid()) as GroupNumber
     from (
          select ServerName, TotalSizeGB from @Servers
          ) x 
     group by ServerName
     ) y
group by GroupNumber

这里的预期产量将是三组,每组约2000GB。我希望它不会准确,但至少接近。如果按服务器分组,可能如下所示:

ServerName  TotalSizeGB GroupNumber
Server10    1023.35 1  
Server1 123.45  1
Server5 567.89  1
Server3 345.67  1
Server4 456.78  2
Server7 789.01  2
Server6 678.90  2
Server2 234.56  3
Server9 901.23  3
Server8 890.12  3

如果我每组拿一笔钱,它会是这样的:

GroupNumber TotalSizeGB
1   2060.36
2   1924.69
3   2025.91

4 个答案:

答案 0 :(得分:1)

SELECT  *
FROM(
    SELECT  y.TotalSizeGB,
            CASE 
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=0 THEN 2
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=1 THEN 1
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=2 THEN 0
                ELSE y.PseudoGrpNumber
            END GrpNumber
    FROM(
        SELECT 
            x.ServerName,
            x.TotalSizeGB,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))%3 PseudoGrpNumber,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))/3 AnotherGrp,
            ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC) RowNum
        FROM    @Servers x
    )y
)z
PIVOT( SUM(z.TotalSizeGB) FOR z.GrpNumber IN([0],[1],[2]) ) pvt;

结果:

0       1       2
------- ------- -------
2048.02 1925.80 2037.14

一些解释:

我们的想法是对TotalSizeGB列下降的数据进行排序。然后,每{3}个顺序将每3个连续行组合在一起(列AnotherGrp),然后按DESC顺序(列ASCPseudoGroNumber)组合在一起。如果它执行GrpNumber衍生表,那么结果将是:

SELECT * FROM () y

答案 1 :(得分:1)

这个任务实际上是科学的(Packing problem,或者某种类型),可能更适合math.stackexchange:)

我的解决方案有两个步骤(因为许多优化问题都是) - 找到一些初始解决方案并尝试对其进行优化。

初始解决方案:

ServerName GroupNo     TotalSizeGB
---------- ----------- -----------
Server1    3           123.45
Server2    3           234.56
Server3    2           345.67
Server4    1           456.78
Server5    2           567.89
Server6    1           678.90
Server7    3           789.01
Server8    3           890.12
Server9    1           901.23
Server10   2           1023.35

GroupNo     GroupSizeGb
----------- -----------
1           2036.91
2           1936.91
3           2037.14

<强>优化

ServerName GroupNo     TotalSizeGB
---------- ----------- -----------
Server1    3           123.45
Server2    3           234.56
Server3    2           345.67
Server4    1           456.78
Server5    3           567.89
Server6    1           678.90
Server7    2           789.01
Server8    2           890.12
Server9    1           901.23
Server10   3           1023.35

GroupNo     GroupSizeGb
----------- -----------
1           2036.91
2           2024.80
3           1949.25

不幸的是,我无法在SQLFiddle上进行设置,因为使用了显式事务。

set nocount on

-- Parameters
declare
  @nGroups int, -- Number of groups to split servers to
  @tolerance float, -- let say 0.0 ... 0.1 (0.1 mean that (+/-)10% deviation allowed from target group size)
  @nTries int, -- refinement tries 100, 1000, 10000 or as much as you can wait if you are not satisfied with initial solution
  @mFactor float, -- refinement param 0.0 ... 1.0
  @tolerance2 float -- let say 0.1 ... 0.3

set @nGroups = 3
set @tolerance = 0
set @nTries = 1000
set @mFactor = 0.3
set @tolerance2 = 0.3


-- Initial Data
create table #Servers (ID int identity, ServerName sysname, TotalSizeGB decimal (12,2), primary key clustered(ID))

insert into #Servers (ServerName, TotalSizeGB) values
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)

create table #Groups (GroupNo int not NULL, primary key clustered (GroupNo))
insert into #Groups (GroupNo)
select N from (select row_number() over (order by @@spid) from sys.all_columns) S(N) where N <= @nGroups

create table #ServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered(ServerID))
create index #IX_GroupServers_GroupNo on #ServerGroups (GroupNo)

declare
    @srvCnt int,
    @grSize decimal (12,2),
    @grNo int,
    @grSz decimal (12,2),
    @srvID int

select @srvCnt = count(1), @grSize = sum(TotalSizeGB) / @nGroups from #Servers
select @grSize as [Target approx. group size]

-- Find initial solution
while (select count(1) from #ServerGroups) < @srvCnt
begin
    select top 1 @grNo = g.GroupNo
    from #Groups g
        left join #ServerGroups sg on sg.GroupNo = g.GroupNo
        left join #Servers s on s.ID = sg.ServerID
    group by g.GroupNo
    order by sum(s.TotalSizeGB)

    select @grSz = IsNull(sum(s.TotalSizeGB), 0)
    from #Groups g
        left join #ServerGroups sg on sg.GroupNo = g.GroupNo
        left join #Servers s on s.ID = sg.ServerID
    where g.GroupNo = @grNo

    select top 1 @srvID = ID
    from #Servers s
    where not exists (select 1 from #ServerGroups where ServerID = s.ID)
    order by abs(@grSize - @grSz - s.TotalSizeGB)

    insert into #ServerGroups (ServerID, GroupNo) values (@srvID, @grNo)
end

select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
from #Groups g
    join #ServerGroups sg on sg.GroupNo = g.GroupNo
    join #Servers s on s.ID = sg.ServerID
group by g.GroupNo


-- Refinement
declare @fTarg float

select @fTarg = sum(abs(case when abs(re) > @tolerance then re else 0 end))
from (
    select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
    from #Groups g
        join #ServerGroups sg on sg.GroupNo = g.GroupNo
        join #Servers s on s.ID = sg.ServerID
    group by g.GroupNo
) t
cross apply (select (GroupSizeGb - @grSize)/@grSize re) p

print @fTarg

if @fTarg > 0
begin

create table #MServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered (ServerID))
insert into #MServerGroups
select ServerID, GroupNo from #ServerGroups

while @nTries > 0
begin
    set @nTries = @nTries - 1

    begin transaction

    ;with MS as (
        select top (100*@mFactor) percent ServerID, GroupNo
        from #MServerGroups
        order by checksum(newid())
    )
    update msg
    set
        msg.GroupNo = case when msg.ServerID = tt.ServerID1 then tt.NewNo1 else tt.NewNo2 end
    from
        #MServerGroups msg
        join (
            select ServerID1, NewNo1, ServerID2, NewNo2
            from (
                select MS.ServerID as ServerID1, SS.GroupNo as NewNo1, SS.ServerID as ServerID2, MS.GroupNo as NewNo2, row_number() over (partition by SS.ServerID order by @@spid) as rn
                from MS
                    join #Servers s on s.ID = MS.ServerID
                    cross apply (
                        select top 1 *
                        from
                            #Servers s2
                            join #MServerGroups ms2 on ms2.ServerID = s2.ID
                        where
                            s2.ID != MS.ServerID and ms2.GroupNo != MS.GroupNo and abs(s2.TotalSizeGB - s.TotalSizeGB)/s.TotalSizeGB < @tolerance2
                        order by checksum(newid())
                    ) SS
            ) t
            where rn = 1
        )tt on msg.ServerID in (tt.ServerID1, tt.ServerID2)

    if @@rowcount = 0
    begin
        rollback transaction
        continue;
    end

    declare @fT float

    select @fT = sum(abs(case when abs(re) > @tolerance then re else 0 end))
    from (
        select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
        from #Groups g
            join #MServerGroups sg on sg.GroupNo = g.GroupNo
            join #Servers s on s.ID = sg.ServerID
        group by g.GroupNo
    ) t
    cross apply (select (GroupSizeGb - @grSize)/@grSize re) p

    if @fT < @fTarg
    begin
        set @fTarg = @ft
        print @fTarg -- the less this number, the better solution is

        commit transaction
    end
    else
        rollback transaction
end

update s
set s.GroupNo = m.GroupNo
from #MServerGroups m
    join #ServerGroups s on s.ServerID = m.ServerID

select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
from #Groups g
    join #ServerGroups sg on sg.GroupNo = g.GroupNo
    join #Servers s on s.ID = sg.ServerID
group by g.GroupNo

drop table #MServerGroups

end
else
    print 'No refinement needed'

drop table #Groups
drop table #ServerGroups
drop table #Servers

我建议从@nTries = 0和合理的@tolerance开始(例如0.1,0.05)。

答案 2 :(得分:0)

检查一下,希望这会有所帮助。我不确定你的'均匀大小的团体是什么意思。 但我在这里做的是首先为一个组分配一个偶数大小,然后如果有任何剩余,然后将它分配给一个具有超过一般组大小的项目的组。 我会建议您决定组号(可能使用sp)并为每个服务器分配大小,而不是ntile。但是对于所描述的问题,下面可能没问题。 并注意我没有测试所有场景。

  declare @TotalSizeGB decimal;
  select  @TotalSizeGB = sum(TotalSizeGB) from @Servers;

  declare @Count int;
  select  @Count = count(TotalSizeGB) from @Servers;

  declare @GroupSize int;
  select  @GroupSize = 3;

  declare @NoofGroups int;
  select  @NoofGroups = 3;

  declare @UnitSizeGB decimal
  Set @UnitSizeGB =(@TotalSizeGB/@Count)*@NoofGroups;

  Declare @Remainder decimal;
  Set @Remainder = @TotalSizeGB-(@UnitSizeGB*@NoofGroups)  

Select GroupNumber,
   CASE 
      WHEN gcount = @GroupSize THEN @UnitSizeGB
      WHEN gcount > @GroupSize THEN @UnitSizeGB+@Remainder
   END 
 From (
 Select 
   GroupNumber,count(ServerName) as gcount,  @UnitSizeGB as UnitSizeGB from(
     Select ServerName,ntile(@GroupSize) over (order by newid()) as GroupNumber
     from (
          select ServerName, TotalSizeGB from @Servers ) x 
     group by ServerName  ) as d
     group by GroupNumber ) as ff

这将提供输出

 GroupNumber      Size
 1                2405
 2                1803
 3                1803 

答案 3 :(得分:0)

这是一个解决方案,可以生成与@ i-one代码相同的结果,但可能更容易理解(至少对我而言)。我使用&#39; chunk&#39;而不是&#39; group&#39;避免关键字冲突。

前提如下。要创建n个大小均匀的块:

  1. 按降序排列所有记录
  2. 按行号
  3. 将前n个记录分配给其块
  4. 遍历其余部分,始终分配给最小的块
  5. 我已将代码上传到SQLFiddle,但它似乎并不喜欢表变量。 Here's the link anyways

    -- Source data:
    DECLARE @Servers TABLE (ServerName SYSNAME, TotalSizeGB DECIMAL (12,2))
    INSERT INTO @Servers VALUES
    ('Server1',123.45),
    ('Server2',234.56),
    ('Server3',345.67),
    ('Server4',456.78),
    ('Server5',567.89),
    ('Server6',678.90),
    ('Server7',789.01),
    ('Server8',890.12),
    ('Server9',901.23),
    ('Server10',1023.35)
    
    
    -- Solution start
    DECLARE @ServersChunked TABLE (
            ServerName SYSNAME,
            TotalSizeGB DECIMAL (12,2),
            RowNum INT,
            ChunkNo INT
        );
    DECLARE
        @ChunkCount INT = 3,
        @MinRowNum INT,
        @SmallestChunk INT;
    
    
    -- Copy table into variable (skip this if the original table can be amended to include the RowNum and ChunkNo fields)
    INSERT INTO @ServersChunked
    SELECT 
        *, 
        RowNum = ROW_NUMBER() OVER (ORDER BY TotalSizeGB DESC), 
        ChunkNo = NULL
    FROM @Servers
    
    -- Assign the initial chunks to largest tables
    UPDATE @ServersChunked
    SET ChunkNo = RowNum
    WHERE RowNum <= @ChunkCount
    
    
    -- Assign chunks to remaining tables
    WHILE EXISTS (SELECT 1 FROM @ServersChunked WHERE ChunkNo IS NULL) BEGIN
    
        -- Find the next table (by descending row count)
        SELECT @MinRowNum = MIN(RowNum) FROM @ServersChunked WHERE ChunkNo IS NULL
    
        -- Find the smallest chunk
        SELECT TOP 1 @SmallestChunk = ChunkNo
        FROM @ServersChunked
        WHERE ChunkNo IS NOT NULL
        GROUP BY ChunkNo
        ORDER BY Sum(TotalSizeGB) ASC
    
        -- Assign the table to the chunk
        UPDATE @ServersChunked
        SET ChunkNo = @SmallestChunk
        WHERE RowNum = @MinRowNum
    END
    

    结果如下:

    ChunkNo SumTotalSizeGB
    1       1936.91
    2       2036.91
    3       2037.14