基于日期的SQL数据聚合

时间:2018-02-08 14:13:58

标签: sql sql-server

我确信这是一个非常愚蠢的问题,我有一个愚蠢的时刻。 考虑以下基本情景(与具有许多维度和度量的现实相比,这是一个非常小的情景):

Data

我需要达到的是预期的输出。 因此,包括params中定义的input_Date和output_date之间的所有成本。但是只包括最新的PID - 定义为:

1-其中PID按顺序运行,或者基于date_to重叠最新值,只要两者在@输出日期都不活动 2-在@输出日期有两个PID激活显示

我不能为我的生活解决如何在SQL中执行此操作,请注意,必须是非动态的,不幸的是不使用任何CTE,只是基本的SQL与子查询

显然,返回必要的ID和PID列表很简单:

declare @input_date date ='2006-01-01'
declare @output_date date ='2006-12-31'

select a.PID, a.ID
from #tmp a
where date_from <=@output_date and date_to >=@input_date

但是我无法弄清楚如何加入这个以返回正确的成本值

drop table tmp
CREATE TABLE [dbo].[tmp](
       [date_from] [datetime] NOT NULL,
       [date_to] [datetime] NOT NULL,
       [ID] [nvarchar](25) NOT NULL,
       [PID] [nvarchar](25) NOT NULL,
       [cost] [float] NULL
) ON [PRIMARY]
INSERT tmp VALUES('2005-1-1','2005-1-31','10001','X123',1254.32)
INSERT tmp VALUES('2000-10-10','2006-8-21','10005','TEST01',21350.9636378758)
INSERT tmp VALUES('2006-8-22','2099-12-31','10005','TEST02',22593.4926163943)
INSERT tmp VALUES('2006-1-1','2099-12-31','10006','X01',22458.3342354444)
INSERT tmp VALUES('2006-2-8','2099-12-31','10006','X02',22480.3772331959)
INSERT tmp VALUES('2006-1-1','2006-2-7','10007','AB01',565.416874152212)
INSERT tmp VALUES('2006-2-8','2006-7-31','10007','AA05',19108.3206482165)

我使用CTE取得了一些进展,所以如果可以的话,你可以看看我会这样做:

drop table #tmp 


CREATE TABLE #tmp (
       [date_from] [datetime] NOT NULL,
       [date_to] [datetime] NOT NULL,
       [ID] [nvarchar](25) NOT NULL,
       [PID] [nvarchar](25) NOT NULL,
       [cost] [float] NULL
) ON [PRIMARY]
INSERT #tmp  VALUES('2005-1-1','2005-1-31','10001','X123',1254.32)
INSERT #tmp  VALUES('2000-10-10','2006-8-21','10005','TEST01',21350.9636378758)
INSERT #tmp  VALUES('2006-8-22','2099-12-31','10005','TEST02',22593.4926163943)
INSERT #tmp  VALUES('2006-1-1','2099-12-31','10006','X01',22458.3342354444)
INSERT #tmp  VALUES('2006-2-8','2099-12-31','10006','X02',22480.3772331959)
INSERT #tmp  VALUES('2006-1-1','2006-2-7','10007','AB01',565.416874152212)
INSERT #tmp  VALUES('2006-2-8','2006-7-31','10007','AA05',19108.3206482165)

declare @input_date date ='2006-01-01'
declare @output_date date ='2006-12-31'


;with cte as (
select t.id,t.PID,t.cost,t.date_from,t.date_to , 
        iif(date_To >= @output_date  OR max_date_To is not null,PID,NULL) as PID2,
        b.total_id_cost 
    from #tmp  t
    left join (select ID,max(date_to) as max_date_to
                from #tmp
                where date_from <=@output_date and date_to >=@input_date
                group by ID) a
    on t.ID = a.ID and t.date_to = a.max_date_to
    left join (Select ID, sum(cost) as total_id_cost
                from  #tmp
                where date_from <=@output_date and date_to >=@input_date
                group by ID) b
    on t.ID = b.ID
    where date_from <=@output_date and date_to >=@input_date )


select distinct ID,PID2,
iif(ID in (
            select ID   
            from cte
            where PID2 IS NULL) 
and ID not in (select ID    
            from cte
            where PID IS NOT NULL
            group by ID
            having count (distinct PID2) >1  ), cte.total_id_cost, cost) as cost
from cte
where PID2 is not null;

4 个答案:

答案 0 :(得分:1)

所以看起来在1个查询中要解决几个问题。

  1. 我们想要与最新日期匹配的PID。这并不太困难,可以通过将数据与找到最新日期的聚合本身相结合来解决
  2. 如果两个PID都处于活动状态,即从日期和日期重叠,则两者都必须显示。我发现这更棘手。最后,我做了一个查询,找到重叠并满足日期的那些,并对此进行了计数。然后使用此计数作为1的连接条件,以便它可以有条件地选择与最新日期匹配的PID
  3. 然后最后使用上面的结果,你可以做总和来获得成本。结果查询有点像怪物,但在这里。 如果它没有涵盖其他不详细的情况,请告诉我。

    DECLARE @Data TABLE (date_from DATETIME, date_to DATETIME, ID INT, PID NVARCHAR(50), COST MONEY)
    INSERT @Data VALUES('2005-1-1','2005-1-31','10001','X123',1254.32)
    INSERT @Data VALUES('2000-10-10','2006-8-21','10005','TEST01',21350.9636378758)
    INSERT @Data VALUES('2006-8-22','2099-12-31','10005','TEST02',22593.4926163943)
    INSERT @Data VALUES('2006-1-1','2099-12-31','10006','X01',22458.3342354444)
    INSERT @Data VALUES('2006-2-8','2099-12-31','10006','X02',22480.3772331959)
    INSERT @Data VALUES('2006-1-1','2006-2-7','10007','AB01',565.416874152212)
    INSERT @Data VALUES('2006-2-8','2006-7-31','10007','AA05',19108.3206482165)
    
    declare @input_date date ='2006-01-01'
    declare @output_date date ='2006-12-31'
    
    
    select
        a.ID,
        PIDForMaxDateThatMatches.PID,
        SUM(a.cost) as cost
    from
        @Data a
        inner join (
            -- number of PIDs for dates that overlap grouped by ID
            select
                a.ID,
                -- where there's no overlap then we want the count to be 1 so that later we can use it as condition
                COUNT(DISTINCT ISNULL(b.PID,'')) as NumberOfPID
            from
                @Data a
                -- may or may not find overlaps
                LEFT JOIN @data b ON
                    b.date_from <=@output_date and
                    b.date_to >=@input_date and
                    a.date_from <= b.date_to and
                    a.date_to >= b.date_from and
                    a.ID = b.ID and
                    a.PID <> b.PID
            where
                a.date_from <=@output_date and
                a.date_to >=@input_date
            group by
                a.ID) as PIDCountForOverlappingMatches ON
            a.ID = PIDCountForOverlappingMatches.ID
        left join (
            -- get the PID that matches the max date_to 
            select
                DataForMaxDate.ID,
                DataForMaxDate.date_from,
                DataForMaxDate.date_to,
                DataForMaxDate.PID
            from
                @Data as DataForMaxDate
                inner join (
                    -- get the max date_to that matches the criteria
                    select
                        ID,
                        MAX(date_to) as maxDateTo
                    from
                        @Data a
                    where
                        date_from <=@output_date and
                        date_to >=@input_date
                    group by
                        ID) as MaxToDatePerID on
                DataForMaxDate.ID = MaxToDatePerID.ID and
                DataForMaxDate.date_to = MaxToDatePerID.maxDateTo) as PIDForMaxDateThatMatches on
            a.ID = PIDForMaxDateThatMatches.ID AND
            -- if there's no overlapping dates the PID count would be 1, which we'll take the PID that matches the max(date_to)
            -- but if there is overlap, then we want both dates to show, thus the from date must also match before we take the PID
            (PIDCountForOverlappingMatches.NumberOfPID = 1 OR a.date_from = PIDForMaxDateThatMatches.date_from)
    
    where
        a.date_from <= @output_date and
        a.date_to >= @input_date
    GROUP BY
        a.ID,
        PIDForMaxDateThatMatches.PID
    ORDER BY
        a.ID    
    

    编辑:DB小提琴http://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=d43cb4b9765da1bca035531e78a2c77d

    结果: ID PID成本 10005 TEST02 43944.4562 10006 X01 22458.3342 10006 X02 22480.3772 10007 AA05 19673.7375

答案 1 :(得分:0)

您好,您可以尝试以下查询:

select a.resource_id ID, max(a.post_id) PID, SUM(a.cost) Cost from #tmp a where date_from <=@output_date and date_to >=@input_date group by a.resource_id order by a.resource_id;

答案 2 :(得分:0)

我认为这可行:

SELECT
    t1.ID, 
    q1.PID, 
    SUM(t1.cost)
FROM
 Table AS t1
JOIN
(
SELECT
    q2.ID,
    t2.PID
FROM
    (
    SELECT
        ID, 
        MAX(date_to) AS maxdate
    FROM
        Table
    GROUP BY
        ID
    ) AS q2
JOIN
    table AS t2
ON
    q2.ID = t2.ID
AND 
    q2.maxdate = t2.date_to
) AS q1
ON
    t1.ID = q1.ID
AND
    t1.PID = q1.PID
GROUP BY
    t1.ID, 
    q1.PID

答案 3 :(得分:0)

这是没有CTE的查询。查询的想法:

1)查找连续日期并在每个id

中创建不同的组

2)查找最小和最大日期,每组成本总和

3)输入参数限制

declare @date_from date = '20060101'
declare @date_to date = '20061231'

declare @myTable table(
    date_from date
    , date_to date
    , id int
    , pid varchar(30)
    , cost decimal(10,2)
)
insert into @myTable values
    ('20050101', '20050201', 10001, 'x123', 1254.32)
    , ('20001010', '20060821', 10005, 'test01', 21350.96)
    , ('20060822', '20991231', 10005, 'test02', 22593.49)
    , ('20060101', '20991231', 10006, 'x01', 22548.33)
    , ('20060208', '20991231', 10006, 'x02', 22480.38)
    , ('20060101', '20060207', 10007, 'abo1', 565.42)
    , ('20060208', '20060731', 10007, 'abo2', 19108.32)

select
    date_from = min(date_from), date_to = max(date_to)
    , id, pid = max(case when date_to = max_date_to then pid end)
    , cost = sum(cost)
from (
    select
        a.date_from, a.date_to, a.id, a.pid, a.cost, a.rn, grp = sum(b.ss)
        , max_date_to = max(a.date_to) over (partition by a.id, sum(b.ss))
    from
        (
            select
                a.*, ss = case when datediff(dd, b.date_to, a.date_from) = 1 then 0 else 1 end
            from
                (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) a
                left join (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) b on a.id = b.id and a.rn - 1 = b.rn
        ) a
        left join (
            select
                a.*, ss = case when datediff(dd, b.date_to, a.date_from) = 1 then 0 else 1 end
            from
                (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) a
                left join (
                    select
                        *, rn = row_number() over (partition by id order by date_from)
                    from
                        @myTable
                ) b on a.id = b.id and a.rn - 1 = b.rn
        ) b on a.id = b.id and a.rn >= b.rn
    group by a.date_from, a.date_to, a.id, a.pid, a.cost, a.rn
) t
group by id, grp, max_date_to
having min(date_from) <= @date_from and max(date_to) >= @date_to
order by id

输出

date_from   date_to     id      pid     cost
------------------------------------------------
2000-10-10  2099-12-31  10005   test02  43944.45
2006-01-01  2099-12-31  10006   x01     22548.33

结果与您提供的输出略有不同。但是:

1)对于id = 10006pid = X02 date_from = 08/02/2006,输入为01/01/2006

2)对于id = 10007 date_to = 31/07/2006,输入为31/12/2006

所以,我觉得查询工作正常

使用cte

以更易读的格式

Rextester demo