NTILE替代非均匀分布的数据集

时间:2015-09-16 08:08:49

标签: sql sql-server tsql sql-server-2012

我有一个数据集并希望显示它,但它可能非常巨大(数千点),我想过滤它们。例如,这里输出1000+点: enter image description here

现在我使用NTILE来获得近似值,但如果点不均匀分布,它就不会像被驱逐的那样工作。我得到这个输出(NTILE参数100):

enter image description here

我该如何避免这种行为? SQL存储过程如下:

ALTER PROCEDURE [dbo].[usp_GetSystemHealthCheckData]
    @DateFrom datetime,         
    @DateTo datetime,            
    @EstimatedPointCount int
    with recompile
AS

BEGIN   
    SET NOCOUNT ON;
    set arithabort on

    if @DateFrom IS NULL
        RAISERROR ('@DateFrom cannot be NULL', 16, 1)

    if @DateTo IS NULL
        RAISERROR ('@DateTo cannot be NULL', 16, 1)     

    if @EstimatedPointCount IS NULL
        RAISERROR ('@EstimatedPointCount cannot be NULL', 16, 1)    

    ;With T as
    (
        SELECT *, GroupId = NTILE(@EstimatedPointCount) over (order by GeneratedOnUtc)
        FROM SystemHealthCheckData
        WHERE GeneratedOnUtc between @DateFrom AND @DateTo
    )

    SELECT  CpuPercentPayload = AVG(CpuPercentPayload),
            FreeRamMb = AVG(FreeRamMb),
            FreeDriveMb = AVG(FreeDriveMb),
            GeneratedOnUtc = CAST(AVG(CAST(GeneratedOnUtc AS DECIMAL( 18, 6))) AS DATETIME)
    FROM T
    GROUP BY GroupId
END

1 个答案:

答案 0 :(得分:2)

编辑:新方法

您可以使用NTILE分割您的负载,然后计算每个组的平均值?我分成4组分组。这使查询返回4个平均值。组数可以根据您已经或可以修复的点数来计算。

这样的事情:

DECLARE @tbl TABLE(id INT IDENTITY, nmbr FLOAT);
INSERT INTO @tbl VALUES(5),(4.5),(4),(3.5),(3),(2.5),(2),(1.5),(1),(1.5),(1),(0.5),(0),(13),(2),(17),(5),(22),(24),(2),(3),(11);

SELECT tbl2.* 
      ,AVG(nmbr) OVER(PARTITION BY tbl2.tile)
FROM
(
    SELECT tbl.*
          ,NTILE(4) OVER(ORDER BY id) AS tile
    FROM @tbl AS tbl
)AS tbl2

如果您希望将其缩小为组值,则可以尝试此

SELECT AVG(nmbr),tbl2.tile
FROM
(
    SELECT tbl.*
          ,NTILE(4) OVER(ORDER BY id) AS tile
    FROM @tbl AS tbl
)AS tbl2
GROUP BY tbl2.tile

- 旧文 您可能想要考虑滑动平均值...在此示例中,我尝试重建您的值(长线性下降和最后的狂野跳跃)。您可以设置@pre和@post变量来设置" flatening"的等级。

简而言之:对每个元素及其直接邻居计算平均值。

请注意您必须添加ORDER BY以避免随机结果......

DECLARE @tbl TABLE(id INT IDENTITY, nmbr FLOAT);
INSERT INTO @tbl VALUES(5),(4.5),(4),(3.5),(3),(2.5),(2),(1.5),(1),(1.5),(1),(0.5),(0),(13),(2),(17),(5),(22),(24),(2),(3),(11);

DECLARE @pre INT=3;
DECLARE @post INT=3;

SELECT tbl.*
      ,AvgBorders.*
      ,AvgSums.* 
      ,AvgSlide.*
FROM @tbl AS tbl
CROSS APPLY
(
    SELECT tbl.id-@pre AS AvgStart
          ,tbl.id + @post AS AvgEnd
) AS AvgBorders
CROSS APPLY
(
    SELECT COUNT(nmbr) AS CountNmbr
          ,SUM(nmbr) AS SumNmbr 
    FROM @tbl AS tbl
    WHERE tbl.id BETWEEN AvgStart AND AvgEnd
) as AvgSums
CROSS APPLY
(
    select AvgSums.SumNmbr / AvgSums.CountNmbr As AvgValue
) As AvgSlide
;