SQL通过分组获取行之间的天数

时间:2015-01-22 23:11:50

标签: sql sql-server azure-sql-database

环境:SQL Azure - 所以一些奇特的功能有些限制。

我有一张表记录了资产上的物流事件,并且从该表中,我想计算资产在设施中的天数。请参阅下面的表格示例:

AssetID LocationID SublocationID MoveDate
CAR1    LOC1       SUB1          1/1/2015 01:01:01
CAR1    LOC1       SUB2          1/3/2015 03:03:03
CAR1    LOC1       SUB1          1/4/2015 04:04:04
CAR1    LOC99      SUB99         1/5/2015 05:05:05
CAR1    LOC1       SUB1          1/9/2015 09:09:09

此表记录从位置/子位置移动到另一个位置/子位置。我不关心这个位置。我只需要报告资产在每个位置的天数。起初我走了这条路:

SELECT  AssetID,
        LocationID,
        DATEDIFF(DAY, MIN(MoveDate), MAX(MoveDate))
FROM    TABLE
GROUP BY AssetID, LocationID

然而,这很快就发现了一个陷阱,在数据中您可以看到资产从LOC1移动到LOC2并返回到LOC1。我的查询将计算2015年1月1日至2015年1月9日期间LOC1的所有天数,而实际上它在LOC99的1/5和1/9之间花费了时间。

是否有纯SQL方法来实现这一目标?

4 个答案:

答案 0 :(得分:0)

使用FAST_FORWARD游标,按日期顺序遍历表,并在临时表中构建结果集。

可以使用LEADLAG完成,但它们在Azure中不可用。某种非游标T-SQL解决方案无疑是可能的,但我怀疑性能会比游标更好。带FAST_FORWARD的游标通常比包含相关子查询的查询执行得更好。

答案 1 :(得分:0)

看起来应该是这样的:

SELECT  [details].[AssetId],
        [details].[LocationId],
        DATEDIFF(DAY, MIN([details].[MovedInDate]), [details].[MoveOutDate]) AS DaysIn
FROM (
    SELECT DISTINCT movedInRow.[AssetId], [movedInRow].[LocationId], [movedInRow].[MoveDate] AS MovedInDate, ISNULL(nm.[MoveDate], GETDATE()) AS MoveOutDate
    FROM [dbo].[t1] movedInRow
        OUTER APPLY (
            SELECT TOP 1 [MoveDate]
            FROM [dbo].[t1]
            WHERE 
                [AssetId] = movedInRow.[AssetId]
                AND [LocationId] != movedInRow.[LocationId]
                AND [MoveDate] >= [movedInRow].[MoveDate]
                ORDER BY [MoveDate] DESC
        ) nm
) AS details
GROUP BY
    [details].[AssetId],
    [details].[LocationId],
    [details].[MoveOutDate];

由于某种原因,两个位置的MoveDate可能是相同的,这个例子没有检查这种可能性。

AssetId LocationId DaysIn
CAR1    LOC1       4
CAR1    LOC1       13
CAR1    LOC99      4

答案 2 :(得分:0)

不使用窗口函数(如LEAD或LAG)并且没有任何t-sql编码,您可以使用递归CTE 来使其工作:

/*Create table and sample data*/

create table #mov (
  AssetID varchar(10),
  LocationID varchar(10),
  SublocationID varchar(10),
  MoveDate datetime
)

insert into #mov
select 'CAR1',    'LOC1',       'SUB1',          '1/1/2015 01:01:01' union all
select 'CAR1',    'LOC1',       'SUB2',          '1/3/2015 03:03:03' union all
select 'CAR1',    'LOC1',       'SUB1',          '1/4/2015 04:04:04' union all
select 'CAR1',    'LOC99',      'SUB99',         '1/5/2015 05:05:05' union all
select 'CAR1',    'LOC1',       'SUB1' ,         '1/9/2015 09:09:09' union all
select 'CAR2',    'LOC1',       'SUB1',          '1/1/2015 01:01:01' union all
select 'CAR2',    'LOC1',       'SUB2',          '1/3/2015 03:03:03' union all
select 'CAR2',    'LOC1',       'SUB1',          '1/4/2015 04:04:04' union all
select 'CAR2',    'LOC99',      'SUB99',         '1/5/2015 05:05:05' union all
select 'CAR2',    'LOC1',       'SUB1' ,         '1/9/2015 09:09:09'

/*Create CTEs*/
/*1. cteMov - adds the row number to the dataset*/
;with cteMov as (
    select AssetID, LocationID, MoveDate, row_number() over(partition by AssetID order by MoveDate) as rn
    from #mov
),
/*recursive CTE to get records groups*/
rec as (
  select AssetID, LocationID, MoveDate, rn, 1 as rnk
  from cteMov
  where rn = 1
  union all
  select c.AssetID, c.LocationID, c.MoveDate, c.rn, case when c.LocationID = rec.LocationID then rec.rnk else rec.rnk + 1 end as rnk
  from cteMov as c
  join rec on c.AssetID = rec.AssetID and c.rn = rec.rn + 1
)
/*3. Final query*/
select
  rec1.AssetID, rec1.LocationID,
  datediff(dd, min(rec1.MoveDate), isnull(max(rec2.MoveDate), getdate())) as DaysSpent, 
  rec1.rnk
from rec as rec1
left join rec as rec2 on rec1.rnk = rec2.rnk - 1
group by rec1.AssetID, rec1.LocationID, rec1.rnk
order by rec1.AssetID, rec1.rnk
option(MAXRECURSION  0)

/*drop temp table */
drop table #mov

结果是:

AssetID    LocationID DaysSpent   rnk
---------- ---------- ----------- -----------
CAR1       LOC1       4           1
CAR1       LOC99      4           2
CAR1       LOC1       13          3
CAR2       LOC1       4           1
CAR2       LOC99      4           2
CAR2       LOC1       13          3

答案 3 :(得分:0)

使用早期回复中的示例数据:

create table t1 (
  AssetID varchar(10),
  LocationID varchar(10),
  SublocationID varchar(10),
  MoveDate datetime
);


insert into t1
select 'CAR1',    'LOC1',       'SUB1',          '1/1/2015 01:01:01' union all
select 'CAR1',    'LOC1',       'SUB2',          '1/3/2015 03:03:03' union all
select 'CAR1',    'LOC1',       'SUB1',          '1/4/2015 04:04:04' union all
select 'CAR1',    'LOC99',      'SUB99',         '1/5/2015 05:05:05' union all
select 'CAR1',    'LOC1',       'SUB1' ,         '1/9/2015 09:09:09' union all
select 'CAR2',    'LOC1',       'SUB1',          '1/1/2015 01:01:01' union all
select 'CAR2',    'LOC1',       'SUB2',          '1/3/2015 03:03:03' union all
select 'CAR2',    'LOC1',       'SUB1',          '1/4/2015 04:04:04' union all
select 'CAR2',    'LOC99',      'SUB99',         '1/5/2015 05:05:05' union all
select 'CAR2',    'LOC1',       'SUB1' ,         '1/9/2015 09:09:09';

select * from t1;

╔═════════╦════════════╦═══════════════╦════════════════════════════════╗
║ ASSETID ║ LOCATIONID ║ SUBLOCATIONID ║            MOVEDATE            ║
╠═════════╬════════════╬═══════════════╬════════════════════════════════╣
║ CAR1    ║ LOC1       ║ SUB1          ║ January, 01 2015 01:01:01+0000 ║
║ CAR2    ║ LOC1       ║ SUB1          ║ January, 01 2015 01:01:01+0000 ║
║ CAR2    ║ LOC1       ║ SUB2          ║ January, 03 2015 03:03:03+0000 ║
║ CAR1    ║ LOC1       ║ SUB2          ║ January, 03 2015 03:03:03+0000 ║
║ CAR1    ║ LOC1       ║ SUB1          ║ January, 04 2015 04:04:04+0000 ║
║ CAR2    ║ LOC1       ║ SUB1          ║ January, 04 2015 04:04:04+0000 ║
║ CAR2    ║ LOC99      ║ SUB99         ║ January, 05 2015 05:05:05+0000 ║
║ CAR1    ║ LOC99      ║ SUB99         ║ January, 05 2015 05:05:05+0000 ║
║ CAR1    ║ LOC1       ║ SUB1          ║ January, 09 2015 09:09:09+0000 ║
║ CAR2    ║ LOC1       ║ SUB1          ║ January, 09 2015 09:09:09+0000 ║
╚═════════╩════════════╩═══════════════╩════════════════════════════════╝    

如果支持lead()分析函数,那么优选的解决方案(简单性和性能方面)将是:

select AssetID, LocationID, 
       sum(datediff(dd,MoveDate,isnull(nextMoveDate,getDate()))) daysAtLoc
from (
    select AssetID, LocationID, MoveDate, 
           lead(MoveDate) over (partition by AssetID
                               order by MoveDate) nextMoveDate
    from t1
    ) t2
group by AssetID, LocationID
order by AssetID, LocationID;

╔═════════╦════════════╦═══════════╗
║ ASSETID ║ LOCATIONID ║ DAYSATLOC ║
╠═════════╬════════════╬═══════════╣
║ CAR1    ║ LOC1       ║        18 ║
║ CAR1    ║ LOC99      ║         4 ║
║ CAR2    ║ LOC1       ║        18 ║
║ CAR2    ║ LOC99      ║         4 ║
╚═════════╩════════════╩═══════════╝

纯SQL解决方案:没有分析,没有递归-TEE,没有OUTER APPLY / associated-subqueries;只是简单的连接。我从未使用过Azure-SQL,但是如果它不支持这个(并且仍然称自己为SQL),我会非常惊讶。

select AssetID, LocationID, 
       sum(datediff(dd,MoveDate,isnull(nextMoveDate,getdate()))) daysAtLoc
from (
    select  t1.AssetID, LocationID, MoveDate,
            min(nextMoveDate) nextMoveDate
    from    t1 
            left outer join
               (select AssetID, MoveDate nextMoveDate
                from   t1) n
                on t1.AssetId = n.AssetID
                   and MoveDate < nextMoveDate)
    group by t1.AssetID, LocationID, MoveDate
    ) t2
group by AssetID, LocationID
order by AssetID, LocationID


╔═════════╦════════════╦═══════════╗
║ ASSETID ║ LOCATIONID ║ DAYSATLOC ║
╠═════════╬════════════╬═══════════╣
║ CAR1    ║ LOC1       ║        18 ║
║ CAR1    ║ LOC99      ║         4 ║
║ CAR2    ║ LOC1       ║        18 ║
║ CAR2    ║ LOC99      ║         4 ║
╚═════════╩════════════╩═══════════╝

性能警告 - 设n为每个资产的最大移动次数,m为资产数量。分析函数版本应具有m *(n log n)的Big-O性能。纯SQL版本应该具有m *(n * n)的Big-O。因此,如果您跟踪一个恒定的资产池,但随着时间的推移添加越来越多的移动(导致每个资产的移动次数稳步增加),查询将变得指数级变慢。如果您在很长一段时间内查询,并且为单个资产记录了数百或数千个移动,则可能需要按月批量计算,然后对这些结果求和。也就是说,如果你拥有庞大的资产池,并且每个资产的移动相对较少,那么纯SQL版本的性能应与Analytic Function版本相当。

- 编辑1:修复了原始SQL解决方案(额外的问题)

的拼写错误

- 编辑2:扩展解决方案以支持日期范围 - 还稍微调整输入数据以验证解决方案的稳健性。

create table t1 (
    AssetID varchar(10), 
    LocationID varchar(10), 
    SublocationID varchar(10), 
    MoveDate datetime, 
    primary key (AssetId, MoveDate));

insert into t1
select 'CAR1', 'LOC1',  'SUB1',  '01/01/2015 00:00:00' union 
select 'CAR1', 'LOC1',  'SUB2',  '01/03/2015 03:03:03' union 
select 'CAR1', 'LOC1',  'SUB1',  '01/04/2015 04:04:04' union 
select 'CAR1', 'LOC99', 'SUB99', '01/05/2015 05:05:05' union 
select 'CAR1', 'LOC1',  'SUB1' , '01/09/2015 09:09:09' union 
select 'CAR2', 'LOC1',  'SUB2',  '01/03/2015 03:03:03' union 
select 'CAR2', 'LOC1',  'SUB1',  '01/04/2015 04:04:04' union 
select 'CAR2', 'LOC99', 'SUB99', '01/05/2015 05:05:05' union 
select 'CAR2', 'LOC1',  'SUB1' , '01/09/2015 09:09:09' union 
select 'CAR3', 'LOC2',  'SUB1' , '01/15/2015 15:15:15'
;

╔═════════╦════════════╦═══════════════╦════════════════════════════════╗
║ ASSETID ║ LOCATIONID ║ SUBLOCATIONID ║            MOVEDATE            ║
╠═════════╬════════════╬═══════════════╬════════════════════════════════╣
║ CAR1    ║ LOC1       ║ SUB1          ║ January, 01 2015 00:00:00+0000 ║
║ CAR1    ║ LOC1       ║ SUB2          ║ January, 03 2015 03:03:03+0000 ║
║ CAR1    ║ LOC1       ║ SUB1          ║ January, 04 2015 04:04:04+0000 ║
║ CAR1    ║ LOC99      ║ SUB99         ║ January, 05 2015 05:05:05+0000 ║
║ CAR1    ║ LOC1       ║ SUB1          ║ January, 09 2015 09:09:09+0000 ║
║ CAR2    ║ LOC1       ║ SUB2          ║ January, 03 2015 03:03:03+0000 ║
║ CAR2    ║ LOC1       ║ SUB1          ║ January, 04 2015 04:04:04+0000 ║
║ CAR2    ║ LOC99      ║ SUB99         ║ January, 05 2015 05:05:05+0000 ║
║ CAR2    ║ LOC1       ║ SUB1          ║ January, 09 2015 09:09:09+0000 ║
║ CAR3    ║ LOC2       ║ SUB1          ║ January, 15 2015 15:15:15+0000 ║
╚═════════╩════════════╩═══════════════╩════════════════════════════════╝

当然,你不需要为dt_ranges使用表格,我只是这样做同时测试各种条件。我更喜欢按照[currentstart,nextstart]来处理日期范围,因为编写不会重叠的SQL变得容易得多,例如:月报。

create table dt_range 
    (thisStartDate date, 
     nextStartDate date, 
     primary key (thisStartDate,nextStartDate));

insert into dt_range 
select '01-dec-2014','01-jan-2015' union
select '01-jan-2015','01-feb-2015' union
select '02-jan-2015','09-jan-2015' union
select '01-feb-2015','01-mar-2015' ;

╔═══════════════╦═══════════════╗
║ THISSTARTDATE ║ NEXTSTARTDATE ║
╠═══════════════╬═══════════════╣
║ 2014-12-01    ║ 2015-01-01    ║
║ 2015-01-01    ║ 2015-02-01    ║
║ 2015-01-02    ║ 2015-01-09    ║
║ 2015-02-01    ║ 2015-03-01    ║
╚═══════════════╩═══════════════╝

查询:

select  thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, 
        round(sum(datediff(ss,ArrivalTime, DepartureTime))/(24.0*60*60),1) DaysAtLoc
from (        
select  thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime, 
        coalesce(min(MoveDate),nextStartDate) DepartureTime
from (
select  assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.assetID, 
        coalesce(ArrivalLocation,InitialLocation) ArrivalLocation, 
        coalesce(ArrivalTime,assetsInRange.thisStartDate) ArrivalTime
from 
      (
        select  thisStartDate, nextStartDate, assetID
        from    dt_range
                join t1 on MoveDate < nextStartDate
        group by thisStartDate, nextStartDate, assetID        
      ) assetsInRange
    left outer join
      (
        select  thisStartDate, nextStartDate, assetID, 
                max(MoveDate) precedingDtRangeMoveDt 
        from    dt_range 
                join t1 
                    on MoveDate < thisStartDate
        group by thisStartDate, nextStartDate, assetID
      ) 
      precedingMoveDt
        on (assetsInRange.assetID = precedingMoveDt.assetID)
    left outer join
      (
        select AssetID, MoveDate precedingDtRangeMoveDt, LocationID initialLocation
        from   t1
      ) 
        precedingMoveLoc
        on (precedingMoveDt.assetID = precedingMoveLoc.AssetID
            and precedingMoveDt.precedingDtRangeMoveDt = precedingMoveLoc.precedingDtRangeMoveDt)
    left outer join 
      (
        select AssetId, LocationId ArrivalLocation, MoveDate ArrivalTime
        from t1
      ) 
        arrivals 
        on assetsInRange.AssetID = arrivals.AssetId
                and ArrivalTime >= assetsInRange.thisStartDate
                and ArrivalTime < assetsInRange.nextStartDate
    group by assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.AssetId, 
            coalesce(ArrivalLocation,InitialLocation) , 
            coalesce(ArrivalTime,assetsInRange.thisStartDate) 
) t
left join t1 on t.assetID = t1.assetID
            and t1.MoveDate > ArrivalTime
            and t1.MoveDate < nextStartDate
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime
) t
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation
order by 1, 3;

结果:

╔═══════════════╦═══════════════╦═════════╦═════════════════╦═══════════╗
║ THISSTARTDATE ║ NEXTSTARTDATE ║ ASSETID ║ ARRIVALLOCATION ║ DAYSATLOC ║
╠═══════════════╬═══════════════╬═════════╬═════════════════╬═══════════╣
║ 2015-01-01    ║ 2015-02-01    ║ CAR1    ║ LOC1            ║ 26.8      ║
║ 2015-01-01    ║ 2015-02-01    ║ CAR1    ║ LOC99           ║ 4.2       ║
║ 2015-01-01    ║ 2015-02-01    ║ CAR2    ║ LOC1            ║ 24.7      ║
║ 2015-01-01    ║ 2015-02-01    ║ CAR2    ║ LOC99           ║ 4.2       ║
║ 2015-01-01    ║ 2015-02-01    ║ CAR3    ║ LOC2            ║ 16.4      ║
║ 2015-01-02    ║ 2015-01-09    ║ CAR1    ║ LOC1            ║ 2.1       ║
║ 2015-01-02    ║ 2015-01-09    ║ CAR1    ║ LOC99           ║ 3.8       ║
║ 2015-01-02    ║ 2015-01-09    ║ CAR2    ║ LOC1            ║ 2.1       ║
║ 2015-01-02    ║ 2015-01-09    ║ CAR2    ║ LOC99           ║ 3.8       ║
║ 2015-02-01    ║ 2015-03-01    ║ CAR1    ║ LOC1            ║ 28        ║
║ 2015-02-01    ║ 2015-03-01    ║ CAR2    ║ LOC1            ║ 28        ║
║ 2015-02-01    ║ 2015-03-01    ║ CAR3    ║ LOC2            ║ 28        ║
╚═══════════════╩═══════════════╩═════════╩═════════════════╩═══════════╝    

注意 - 我假设资产的第一条记录表明它之前在任何位置都不存在...所以2014年12月 - 2015年1月的测试月份没有显示在结果中,因为没有资产2014年搬家日期。