SQL Query用于计算中位数和分组数

时间:2013-04-18 18:09:00

标签: sql sql-server-2008 tsql

我有下表。

DECLARE @TBL_RESULT Table
(   
    ID varchar(10),
    CreateDate DateTime,
    PEOPLE_CODE_ID varchar(10), 
    CONVERSION_DATE DateTime,
    CAMPUS varchar(20),
    DAYS_TOOK int   
);

此表格记录了从2013年1月1日至收到并转换的所有潜在客户的日期。

我最初需要找到转换过去10周内收到的潜在客户所花费的中位时间,并按照校园进行分组我能够使用下面的SQL查询

WITH    CTE_RESULT
          AS ( SELECT   *
               FROM     @TBL_RESULT
               WHERE    CreateDate > DATEADD(WEEK, -10, GETDATE())
             )
    SELECT  Campus ,
            AVG(DAYS_TOOK) AS MedianTime
    FROM    ( SELECT    CAMPUS ,
                        Days_Took ,
                        ROW_NUMBER() OVER ( PARTITION BY Campus ORDER BY Days_Took ASC ) AS AgeRank ,
                        COUNT(*) OVER ( PARTITION BY CAMPUS ) AS CampusCount
              FROM      CTE_RESULT
            ) x
    WHERE   x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
    GROUP BY x.Campus   

我现在需要在图表上绘制这个趋势,即查找前10周桶的记录,并在折线图上绘制中位数 - 每条线是一个校园。 (按校园分组)

光标是我唯一的选择吗?从1月1日开始,我将找到前10周的线索,执行上述SQL查询以获得中位数,将其推送到临时表,然后查找接下来的10周等等。

或者我能做得更好吗?

2 个答案:

答案 0 :(得分:3)

如果您不需要尝试优化查询,如果需要在多个10周期间生成相同结果,则可以将当前(10周前到今天)范围扩展到与必需,在整个查询中穿透 PeriodEndDate ,如下所示。

SQL Fiddle

MS SQL Server 2012架构设置

查询1

DECLARE @TBL_RESULT Table
(   
    ID varchar(10),
    CreateDate DateTime,
    PEOPLE_CODE_ID varchar(10), 
    CONVERSION_DATE DateTime,
    CAMPUS varchar(20),
    DAYS_TOOK int   
);

-- fill the table with some dummy data from 2013-01-01
INSERT @TBL_RESULT (CreateDate, Campus, Days_Took)
SELECT DATEADD(D, A.Number, '20130101'), 'Campus' + Right(B.Number, 10),
       ABS(CAST(NEWID() AS binary(6)) % 130) + 1
FROM master..spt_values A
JOIN master..spt_values B on B.type='P' and B.number < 50 -- 50 campuses
WHERE A.type='P'
  AND DATEADD(D, A.Number, '20130101') <= GetDate();

-- This first CTE is used to create the required number of 10-week periods
WITH N(NUMBER) AS (
  SELECT 0
  union all
  select number+1 from N
  where Number <= DATEDIFF(WEEK, '20130101', GETDATE())
),
-- and from below here it's your query with the PeriodEndDate threaded through
CTE_RESULT AS (
  SELECT   DATEADD(WEEK, -Number, GETDATE()) PeriodEndDate,
           T.*
  FROM     @TBL_RESULT T
  CROSS    JOIN     N
           -- you see the range built up dynamically here
  WHERE    CreateDate > DATEADD(WEEK, -Number-10, GETDATE())
    AND    CreateDate < DATEADD(WEEK, -Number, GETDATE()) +1
)
SELECT  PeriodEndDate, Campus ,
        AVG(DAYS_TOOK) AS MedianTime
FROM    (
         SELECT PeriodEndDate,   CAMPUS ,
         Days_Took ,
         ROW_NUMBER() OVER ( PARTITION BY PeriodEndDate, Campus ORDER BY Days_Took ASC ) AS AgeRank ,
         COUNT(*) OVER ( PARTITION BY PeriodEndDate, CAMPUS ) AS CampusCount
         FROM      CTE_RESULT
        ) x
WHERE   x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.PeriodEndDate, x.Campus
ORDER BY x.PeriodEndDate, x.Campus;

答案 1 :(得分:0)

似乎你解决了问题的难点。

要获得所需内容,您需要引入分组变量。在这种情况下,我测量过去的周数并除以10(SQL Server进行整数除法,因此产生一个整数)。

您只需在partition bygroup by声明中明智地使用它:

WITH CTE_RESULT AS (
       SELECT   t.*,
                DATEDIFF(week, CreateDate, GETDATE()) / 10 as groupnum
       FROM     @TBL_RESULT t
      )
SELECT Campus, groupnum, MIN(CreateDate), MAX(CreateDate),
       AVG(DAYS_TOOK) AS MedianTime
FROM (SELECT t.*,
              ROW_NUMBER() OVER (PARTITION BY groupnum, Campus ORDER BY Days_Took ASC ) AS AgeRank ,
              COUNT(*) OVER (PARTITION BY groupnum, CAMPUS) AS CampusCount
      FROM CTE_RESULT t
     ) x
WHERE x.AgeRank IN ( x.CampusCount / 2 + 1, ( x.CampusCount + 1 ) / 2 )
GROUP BY x.Campus, groupnum

我没有对此进行过测试,因此可能会出现两个语法错误。