如何在日期时间范围内选择每列另一列的最大值?

时间:2013-05-31 16:12:46

标签: sql sql-server tsql sql-server-2012 distinct

CREATE TABLE [T]
(
    CreatedOn DATETIME NOT NULL
    ,Name NVARCHAR(20) NOT NULL
    ,Code NVARCHAR(20) NOT NULL
    ,R FLOAT NULL
    ,C1 FLOAT NULL
    ,C2 FLOAT NULL
    ,C3 FLOAT NULL
);

INSERT INTO [T] VALUES 
 ('2013-01-01', N'A', N'',    13, NULL,  NULL, NULL)
,('2013-01-07', N'A', N'1', NULL,    5,  NULL, NULL)
,('2013-01-31', N'A', N'2', NULL,    4,  NULL, NULL)
,('2013-02-01', N'A', N'1', NULL, NULL,     6, NULL)
,('2013-02-15', N'A', N'2', NULL, NULL,  NULL,    3)
,('2013-03-01', N'A', N'1', NULL,    1,  NULL, NULL)
,('2013-03-05', N'A', N'',     8, NULL,  NULL, NULL)
,('2013-03-22', N'A', N'2', NULL, NULL,  NULL,    1)
,('2013-05-01', N'A', N'1', NULL,    2,  NULL, NULL);

In [T]
1. One and only one non-null value per row for [R], [C1], [C2] and [C3]
2. [Code] contains a non-empty value if [C1], [C2] or [C3] contains a non-null value
3. There is an index on [Name]
4. Contains millions of rows
5. Few unique values of [Code], typically less than 100
6. Few unique values of [Name], typically less than 10000
7. Is actually a complex view containing several inner joins

如何从[T]([DateMonth],[P])中选择[CreatedOn]> = @Start AND [CreatedOn]< = @End AND [Name] = @Name AND [P] = Sum([R]) - (Sum(MaxOf(Sum([C1]),Sum([C2]),Sum([C3]),每个唯一[Code])))? (请参阅下面的预期输出以获得更准确的“解释”。)每个月@Start - @End的结果集中应该有一行,无论[T]中是否存在该月的行。临时桌使用是可以接受的。

Expected Output
@Name = N'A'
@Start = '2012-12-01'
@End = '2013-07-01'

DateMonth    P
'2012-12-01' 0
'2013-01-01' 4  --  4 = SUM([R])=13 -      (MaxForCode'1'(SUM(C1)=5,     SUM(C2)=0,     SUM(C3)=0)=5 + MaxForCode'2'(SUM(C1)=4, SUM(C2)=0, SUM(C3)=0)=4)
'2013-02-01' 3  --  3 = SUM([R])=13 -      (MaxForCode'1'(SUM(C1)=5,     SUM(C2)=6,     SUM(C3)=0)=6 + MaxForCode'2'(SUM(C1)=4, SUM(C2)=0, SUM(C3)=3)=4)
'2013-03-01' 11 -- 11 = SUM([R])=13+8=21 - (MaxForCode'1'(SUM(C1)=5+1=6, SUM(C2)=6,     SUM(C3)=0)=6 + MaxForCode'2'(SUM(C1)=4, SUM(C2)=0, SUM(C3)=3+1=4)=4)
'2013-04-01' 11
'2013-05-01' 9  --  9 = SUM([R])=13+8=21 - (MaxForCode'1'(SUM(C1)=5+1=6, SUM(C2)=6+2=8, SUM(C3)=0)=8 + MaxForCode'2'(SUM(C1)=4, SUM(C2)=0, SUM(C3)=3+1=4)=4) 
'2013-06-01' 9
'2013-07-01' 9

1 个答案:

答案 0 :(得分:1)

这是一个解决方案。肯定会有一些性能改进,但我会把它留给你和你的具体情况。请注意,CTE的使用当然不是必需的,并且将CreatedOn添加到索引中将非常有用。临时表也可能比表变量更好,但您需要对其进行评估。

由于我认为您正在寻找的是总计,this article可能有助于提高我建议的解决方案的效果。

就个人而言,我首先考虑不使用视图,因为直接使用创建视图的sql可能会更高效。

这是SQL和SQLFiddle link.

DECLARE @Name NVARCHAR(1) = N'A',
@Start DATETIME = '2012-12-01',
@End DATETIME = '2013-07-01'


--get the date for the first of the start and end months
DECLARE @StartMonth DATETIME = DATEADD(month, DATEDIFF(month, 0, @Start), 0)
DECLARE @EndMonth DATETIME = DATEADD(month, DATEDIFF(month, 0, @End), 0)


DECLARE @tt TABLE
(
  DateMonth DATETIME,
  sum_r FLOAT,
  code NVARCHAR(20),
  max_c FLOAT
)

--CTE to create a simple table with an entry for each month (and nxt month)
;WITH Months
as
(
    SELECT @StartMonth as 'dt', DATEADD(month, 1, @StartMonth) as 'nxt' 
    UNION ALL
    SELECT DATEADD(month, 1, dt) as 'dt', DATEADD(month, 2, dt) as 'nxt' 
    FROM Months
    WHERE dt < @EndMonth
)
--SELECT * FROM Months OPTION(MAXRECURSION 9965) --for the CTE, you could also select dates into a temp table/table var first


INSERT INTO @tt (DateMonth, sum_r, code, max_c)
SELECT M.dt DateMonth,
       ISNULL(t.sum_r,0) sum_r,
       ISNULL(t.code,'') code,
       ISNULL(t.max_c,0) max_c
      --sum_c1, sum_c2, sum_c3, cnt
FROM Months M
     OUTER APPLY (
                   SELECT   sum_r,
                            code,
                            CASE WHEN sum_c1 >= sum_c2 AND sum_c1 >= sum_c3 THEN sum_c1
                                 WHEN sum_c2 >= sum_c1 AND sum_c2 >= sum_c3 THEN sum_c2
                                 ELSE sum_c3
                            END max_c
                             --sum_c1, sum_c2, sum_c3, cnt
                    FROM (  --use a sub select here to improve case statement performance getting max_c
                            SELECT SUM(ISNULL(r,0)) sum_r, 
                                   code, 
                                   sum(ISNULL(c1,0)) sum_c1, 
                                   sum(ISNULL(c2,0)) sum_c2, 
                                   SUM(ISNULL(c3,0)) sum_c3
                            FROM T
                            WHERE CreatedOn >= @Start AND CreatedOn < M.nxt
                                    AND CreatedOn <= @End
                                   AND Name = @Name
                            GROUP BY code
                          ) subselect
                 ) t
OPTION (MAXRECURSION 999)



SELECT DateMonth, SUM(sum_r) - SUM(max_c) p
FROM @tt
GROUP BY DateMonth