分组数据而不更改查询流

时间:2015-01-29 06:58:58

标签: sql sql-server performance sql-server-2008 sql-server-2008-r2

对我而言,很难解释我想要什么,所以文章的名称可能不清楚,但我希望我可以用代码来描述它。

我有一些数据有两个最重要的值,所以让时间为t和值f(t)。它存储在表格中,例如

1 - 1000
2 - 1200
3 - 1100
4 - 1500
...

我想使用它绘制图表,此图表应包含N个点。如果table的行少于这个N,那么我们只返回这个表。但如果没有,我们应该对这些要点进行分组,例如N = Count/2,然后举例说明:

1 - (1000+1200)/2 = 1100
2 - (1100+1500)/2 = 1300
...

我写了一个SQL脚本(它适用于N>> Count)(MonitoringDateTime - t和ResultCount if f(t)

ALTER PROCEDURE [dbo].[usp_GetRequestStatisticsData]
    @ResourceTypeID bigint,        
    @DateFrom datetime,         
    @DateTo datetime,            
    @EstimatedPointCount int
AS

BEGIN   
    SET NOCOUNT ON;
    SET ARITHABORT ON; 


    declare @groupSize int;  
    declare @resourceCount int;

    select @resourceCount = Count(*)
    from ResourceType
    where ID & @ResourceTypeID > 0


    SELECT d.ResultCount        
          ,MonitoringDateTime = d.GeneratedOnUtc
          ,ResourceType = a.ResourceTypeID,
          ROW_NUMBER() OVER(ORDER BY d.GeneratedOnUtc asc) AS Row
    into #t
    FROM dbo.AgentData d
      INNER JOIN dbo.Agent a ON a.CheckID = d.CheckID
    WHERE d.EventType = 'Result' AND
          a.ResourceTypeID & @ResourceTypeID > 0 AND
          d.GeneratedOnUtc between @DateFrom AND @DateTo AND
          d.Result = 1


    select @groupSize = Count(*) / (@EstimatedPointCount * @resourceCount)
    from #t

    if @groupSize = 0 -- return all points

        select ResourceType, MonitoringDateTime, ResultCount
        from #t

    else

        select ResourceType,   CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
        from #t 
        where [Row] % @groupSize = 0 
        group by ResourceType, [Row]
        order by MonitoringDateTime
END

,但它对N~ = Count不起作用,并且花费大量时间进行插入。 enter image description here 这就是我想使用CTE's的原因,但它不适用于if else语句。

所以我计算了一个组号的公式(在GroupBy子句中使用它),因为我们有

GroupNumber = Count < N ? Row : Row*NumberOfGroups

其中Count - 表中行的数量,以及NumberOfGroups = Count / EstimatedPointCount

使用一些琐碎的数学我们得到一个公式

GroupNumber = Row + (Row*Count/EstimatedPointCount - Row)*MAX(Count - Count/EstimatedPointCount,0)/(Count - Count/EstimatedPointCount)

但由于Count聚合函数它不起作用:

Column 'dbo.AgentData.ResultCount' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

我的英语非常糟糕,我知道(而且我正在努力改进它),但希望最后死亡,所以请提出建议。


查询结果

SELECT d.ResultCount        
          , MonitoringDateTime = d.GeneratedOnUtc
          , ResourceType = a.ResourceTypeID
    FROM dbo.AgentData d
      INNER JOIN dbo.Agent a ON a.CheckID = d.CheckID
    WHERE   d.GeneratedOnUtc between '2015-01-28' AND '2015-01-30' AND
            a.ResourceTypeID & 1376256 > 0 AND
            d.EventType = 'Result' AND   
            d.Result = 1

https://onedrive.live.com/redir?resid=58A31FC352FC3D1A!6118&authkey=!AATDebemNJIgHoo&ithint=file%2ccsv

2 个答案:

答案 0 :(得分:3)

以下是使用NTILE的示例以及问题顶部的简单示例数据:

declare @samples table (ID int, sample int)
insert into @samples (ID,sample) values
(1,1000),
(2,1200),
(3,1100),
(4,1500)

declare @results int
set @results = 2

;With grouped as (
    select *,NTILE(@results) OVER (order by ID) as nt
    from @samples
)
select nt,AVG(sample) from grouped
group by nt

产生:

nt                   
-------------------- -----------
1                    1100
2                    1300

如果@results更改为4(或任何更高的数字),那么您只需返回原始结果集。

不幸的是,我没有您的完整数据,也无法完全理解您尝试使用完整存储过程执行的操作,因此上述内容可能需要进行一些调整。

答案 1 :(得分:1)

我还没试过,但是怎么样而不是

select ResourceType,   CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
        from #t 
        where [Row] % @groupSize = 0 
        group by ResourceType, [Row]
        order by MonitoringDateTime

或许类似

select ResourceType,   CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
        from #t 
        group by ResourceType, convert(int,[Row]/@groupSize)
        order by MonitoringDateTime

也许这会让你指向一个新的方向?通过转换为int我们正在截断小数点后的所有内容,所以我希望这会给你一个更好的分组?你可能需要把你的行号放在资源类型上才能使用吗?