将套装分成不均匀的百分比桶

时间:2018-10-16 08:50:17

标签: tsql set ssms sql-server-2016

每天我都会返回一组x行(5到2000之间)。

我需要根据规则从该集合更新一列。我认为这个(并非完全有效的)示例演示了这一点

/* 
    35% a
    25% b
    30% c
    10% null
*/

WITH tally
(vals, updateThis, bucket)
AS
(
    SELECT
         DATEADD(DAY, - ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), GETDATE())
        , NULL
        , NTILE(100) OVER (ORDER BY (SELECT NULL))
    FROM
    (
        VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS a(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS b(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS c(n)
    )
--UPDATE
    --SET updateThis
, updated
AS
(
    SELECT
     t.vals
    , CASE
        WHEN t.bucket <= 35 THEN 'a'
        WHEN t.bucket > 35 AND t.bucket <=60 THEN 'b'
        WHEN t.bucket > 60 AND t.bucket <=90 THEN 'c'
        WHEN t.bucket > 60 AND t.bucket <=90 THEN 'NULL'
    END AS updated
    , t.bucket
    FROM tally t
)
SELECT 
    U.updated
    , COUNT(1) AS actual
FROM 
updated u
GROUP BY U.updated

此解决方案并不精确,即使a + b + c构成了100%,它也可能不会更新所有行。同样,它对于少于100行的集合也不起作用。

我当前的解决方案是:

  • 计算总行数
  • 计算实际需要的行(CEILING((@totalRows * ratio) / 100)
  • 在WHILE LOOP中更新最终集,选择当前值和所需的行。

有没有更好的基于集合的解决方案,可以帮助我摆脱循环?

1 个答案:

答案 0 :(得分:2)

不知道,如果我没听错的话...

首先,这里似乎有一个相当明显的错误:

    WHEN t.bucket > 60 AND t.bucket <=90 THEN 'NULL'

这不是这个吗?

    WHEN t.bucket >90 THEN 'NULL'

函数NTILE会将您的集合分散到多个桶中。检查我的输出,并找出在极端情况下的行为。我建议像这样使用每行的计算百分比:

WITH tally
(vals, bucket)
AS
(
    SELECT
         DATEADD(DAY, - ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), GETDATE())
        ,NTILE(100) OVER (ORDER BY (SELECT NULL))
    FROM
    (
        VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS a(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS b(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS c(n)
    )
SELECT *
INTO #tmpBuckets
FROM Tally;

-我使用此#tmpBuckets-table来更接近您的我有一张桌子场景

WITH Numbered AS
(
    SELECT *
          ,ROW_NUMBER() OVER(ORDER BY vals DESC) / ((SELECT COUNT(*) FROM #tmpBuckets)/100.0)  AS RunningPercentage
    FROM #tmpBuckets
)
,ComputeBuckets AS
(
    SELECT
     t.*
    , CASE
        WHEN t.RunningPercentage <= 35 THEN 'a'
        WHEN t.RunningPercentage > 35 AND t.RunningPercentage <=60 THEN 'b'
        WHEN t.RunningPercentage > 60 AND t.RunningPercentage <=90 THEN 'c'
        WHEN t.RunningPercentage >90  THEN 'NULL'
    END AS ShnugoMethod
    , CASE
        WHEN t.bucket <= 35 THEN 'a'
        WHEN t.bucket > 35 AND t.RunningPercentage <=60 THEN 'b'
        WHEN t.bucket > 60 AND t.RunningPercentage <=90 THEN 'c'
        WHEN t.bucket > 90  THEN 'NULL'
    END AS ZikatoMethod
    FROM Numbered t
)
SELECT cb.*
FROM ComputeBuckets cb
ORDER BY cb.vals DESC

GO
DROP TABLE #tmpBuckets;

我想您知道如何使用这种cte更新源表。否则,请再回答一个问题:-)