如何在SQL Server中执行10%45%45%的拆分

时间:2014-05-16 18:12:19

标签: sql sql-server database split

我试图弄清楚如何在SQL Server中进行10%45%45%的拆分。我想出了一种方法,使用NTile来分配一个组,然后根据简单的数学分组,但是当识别出少于20个记录时解决方案就会失效

SELECT
    Email, 
    CASE 
       WHEN Group in (1,2) THEN 'Group1'
       WHEN Group BETWEEN 3 AND 11 THEN 'Group2'
       WHEN Group BETWEEN 12 AND 20 THEN 'Group3'
    END AS [Group]
FROM
   (SELECT
        email, optDate, 
        NTILE(20) OVER(ORDER BY NEWID()) As Group) T

我还遇到了无法使用临时表或创建变量的恼人问题,我的解决方案必须以select语句开头。我可以分阶段创建结果步骤,并在第2阶段的查询中使用第1阶段的结果,但我很难找到一个好的解决方案。

2 个答案:

答案 0 :(得分:2)

Row_Number应该足够了

WITH A AS (
  SELECT TOP 1000000
         email, optDate
       , ID = Row_Number() OVER (ORDER BY email, optDate)
       , Items = COUNT() OVER (PARTITION BY (Select 1))
  FROM   myTable
  ORDER BY NEWID()
)
SELECT
    Email, 
    CASE 
       WHEN ID < Items * 0.1 THEN 'Group1'
       WHEN ID < Items * 0.55 THEN 'Group2'
       ELSE 'Group3'
    END AS [Group]
FROM A

CASE的乘数中,请务必将当前值下的组值添加到正确的结果中(第二组从项目* 0.10到项目* 0.55,差异为项目* 0.45)

TOP中的CTE是获取ORDER BY所必需的,因为TOP 100%实际上并不是命令的结果集,您必须使用至少等于查询返回的行数。

如果你不能使用偶数CTE用主查询中的A替换具有相同定义的子查询:

SELECT
    Email, 
    CASE 
       WHEN ID < Items * 0.1 THEN 'Group1'
       WHEN ID < Items * 0.55 THEN 'Group2'
       ELSE 'Group3'
    END AS [Group]
FROM (SELECT TOP 1000000
             email, optDate
           , ID = Row_Number() OVER (ORDER BY email, optDate)
           , Items = COUNT() OVER (PARTITION BY (Select 1))
      FROM   myTable
      ORDER BY NEWID()
     ) A

答案 1 :(得分:2)

我使用显式计数和数字来解决这些问题。以下使用newid()随机分配一系列数字。其余的只是算术:

SELECT Email,
       (CASE WHEN seqnum <= cnt * 0.10 THEN 'Group1'
             WHEN seqnum <= cnt * (0.10 + 0.45) THEN 'Group2'
             ELSE 'Group3'
        END) as [Group]
FROM (SELECT email, optDate, 
             row_number() over (order by newid()) as seqnum,
             count(*) over () as cnt
      FROM t
     ) t;

作为备注:有一些讨论表明checksum(newid())实际上对于随机排序比newid()更好。 (甚至其他人推荐rand(checksum(newid())))。为了您的目的,任何这些都可能就足够了。