环境:SQL Azure - 所以一些奇特的功能有些限制。
我有一张表记录了资产上的物流事件,并且从该表中,我想计算资产在设施中的天数。请参阅下面的表格示例:
AssetID LocationID SublocationID MoveDate
CAR1 LOC1 SUB1 1/1/2015 01:01:01
CAR1 LOC1 SUB2 1/3/2015 03:03:03
CAR1 LOC1 SUB1 1/4/2015 04:04:04
CAR1 LOC99 SUB99 1/5/2015 05:05:05
CAR1 LOC1 SUB1 1/9/2015 09:09:09
此表记录从位置/子位置移动到另一个位置/子位置。我不关心这个位置。我只需要报告资产在每个位置的天数。起初我走了这条路:
SELECT AssetID,
LocationID,
DATEDIFF(DAY, MIN(MoveDate), MAX(MoveDate))
FROM TABLE
GROUP BY AssetID, LocationID
然而,这很快就发现了一个陷阱,在数据中您可以看到资产从LOC1移动到LOC2并返回到LOC1。我的查询将计算2015年1月1日至2015年1月9日期间LOC1的所有天数,而实际上它在LOC99的1/5和1/9之间花费了时间。
是否有纯SQL方法来实现这一目标?
答案 0 :(得分:0)
使用FAST_FORWARD
游标,按日期顺序遍历表,并在临时表中构建结果集。
可以使用LEAD
或LAG
完成,但它们在Azure中不可用。某种非游标T-SQL解决方案无疑是可能的,但我怀疑性能会比游标更好。带FAST_FORWARD
的游标通常比包含相关子查询的查询执行得更好。
答案 1 :(得分:0)
看起来应该是这样的:
SELECT [details].[AssetId],
[details].[LocationId],
DATEDIFF(DAY, MIN([details].[MovedInDate]), [details].[MoveOutDate]) AS DaysIn
FROM (
SELECT DISTINCT movedInRow.[AssetId], [movedInRow].[LocationId], [movedInRow].[MoveDate] AS MovedInDate, ISNULL(nm.[MoveDate], GETDATE()) AS MoveOutDate
FROM [dbo].[t1] movedInRow
OUTER APPLY (
SELECT TOP 1 [MoveDate]
FROM [dbo].[t1]
WHERE
[AssetId] = movedInRow.[AssetId]
AND [LocationId] != movedInRow.[LocationId]
AND [MoveDate] >= [movedInRow].[MoveDate]
ORDER BY [MoveDate] DESC
) nm
) AS details
GROUP BY
[details].[AssetId],
[details].[LocationId],
[details].[MoveOutDate];
由于某种原因,两个位置的MoveDate可能是相同的,这个例子没有检查这种可能性。
AssetId LocationId DaysIn
CAR1 LOC1 4
CAR1 LOC1 13
CAR1 LOC99 4
答案 2 :(得分:0)
不使用窗口函数(如LEAD或LAG)并且没有任何t-sql编码,您可以使用递归CTE 来使其工作:
/*Create table and sample data*/
create table #mov (
AssetID varchar(10),
LocationID varchar(10),
SublocationID varchar(10),
MoveDate datetime
)
insert into #mov
select 'CAR1', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR1', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR1', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR1', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR1', 'LOC1', 'SUB1' , '1/9/2015 09:09:09' union all
select 'CAR2', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR2', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR2', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR2', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR2', 'LOC1', 'SUB1' , '1/9/2015 09:09:09'
/*Create CTEs*/
/*1. cteMov - adds the row number to the dataset*/
;with cteMov as (
select AssetID, LocationID, MoveDate, row_number() over(partition by AssetID order by MoveDate) as rn
from #mov
),
/*recursive CTE to get records groups*/
rec as (
select AssetID, LocationID, MoveDate, rn, 1 as rnk
from cteMov
where rn = 1
union all
select c.AssetID, c.LocationID, c.MoveDate, c.rn, case when c.LocationID = rec.LocationID then rec.rnk else rec.rnk + 1 end as rnk
from cteMov as c
join rec on c.AssetID = rec.AssetID and c.rn = rec.rn + 1
)
/*3. Final query*/
select
rec1.AssetID, rec1.LocationID,
datediff(dd, min(rec1.MoveDate), isnull(max(rec2.MoveDate), getdate())) as DaysSpent,
rec1.rnk
from rec as rec1
left join rec as rec2 on rec1.rnk = rec2.rnk - 1
group by rec1.AssetID, rec1.LocationID, rec1.rnk
order by rec1.AssetID, rec1.rnk
option(MAXRECURSION 0)
/*drop temp table */
drop table #mov
结果是:
AssetID LocationID DaysSpent rnk
---------- ---------- ----------- -----------
CAR1 LOC1 4 1
CAR1 LOC99 4 2
CAR1 LOC1 13 3
CAR2 LOC1 4 1
CAR2 LOC99 4 2
CAR2 LOC1 13 3
答案 3 :(得分:0)
使用早期回复中的示例数据:
create table t1 (
AssetID varchar(10),
LocationID varchar(10),
SublocationID varchar(10),
MoveDate datetime
);
insert into t1
select 'CAR1', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR1', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR1', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR1', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR1', 'LOC1', 'SUB1' , '1/9/2015 09:09:09' union all
select 'CAR2', 'LOC1', 'SUB1', '1/1/2015 01:01:01' union all
select 'CAR2', 'LOC1', 'SUB2', '1/3/2015 03:03:03' union all
select 'CAR2', 'LOC1', 'SUB1', '1/4/2015 04:04:04' union all
select 'CAR2', 'LOC99', 'SUB99', '1/5/2015 05:05:05' union all
select 'CAR2', 'LOC1', 'SUB1' , '1/9/2015 09:09:09';
select * from t1;
╔═════════╦════════════╦═══════════════╦════════════════════════════════╗
║ ASSETID ║ LOCATIONID ║ SUBLOCATIONID ║ MOVEDATE ║
╠═════════╬════════════╬═══════════════╬════════════════════════════════╣
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 01 2015 01:01:01+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 01 2015 01:01:01+0000 ║
║ CAR2 ║ LOC1 ║ SUB2 ║ January, 03 2015 03:03:03+0000 ║
║ CAR1 ║ LOC1 ║ SUB2 ║ January, 03 2015 03:03:03+0000 ║
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 04 2015 04:04:04+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 04 2015 04:04:04+0000 ║
║ CAR2 ║ LOC99 ║ SUB99 ║ January, 05 2015 05:05:05+0000 ║
║ CAR1 ║ LOC99 ║ SUB99 ║ January, 05 2015 05:05:05+0000 ║
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 09 2015 09:09:09+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 09 2015 09:09:09+0000 ║
╚═════════╩════════════╩═══════════════╩════════════════════════════════╝
如果支持lead()分析函数,那么优选的解决方案(简单性和性能方面)将是:
select AssetID, LocationID,
sum(datediff(dd,MoveDate,isnull(nextMoveDate,getDate()))) daysAtLoc
from (
select AssetID, LocationID, MoveDate,
lead(MoveDate) over (partition by AssetID
order by MoveDate) nextMoveDate
from t1
) t2
group by AssetID, LocationID
order by AssetID, LocationID;
╔═════════╦════════════╦═══════════╗
║ ASSETID ║ LOCATIONID ║ DAYSATLOC ║
╠═════════╬════════════╬═══════════╣
║ CAR1 ║ LOC1 ║ 18 ║
║ CAR1 ║ LOC99 ║ 4 ║
║ CAR2 ║ LOC1 ║ 18 ║
║ CAR2 ║ LOC99 ║ 4 ║
╚═════════╩════════════╩═══════════╝
纯SQL解决方案:没有分析,没有递归-TEE,没有OUTER APPLY / associated-subqueries;只是简单的连接。我从未使用过Azure-SQL,但是如果它不支持这个(并且仍然称自己为SQL),我会非常惊讶。
select AssetID, LocationID,
sum(datediff(dd,MoveDate,isnull(nextMoveDate,getdate()))) daysAtLoc
from (
select t1.AssetID, LocationID, MoveDate,
min(nextMoveDate) nextMoveDate
from t1
left outer join
(select AssetID, MoveDate nextMoveDate
from t1) n
on t1.AssetId = n.AssetID
and MoveDate < nextMoveDate)
group by t1.AssetID, LocationID, MoveDate
) t2
group by AssetID, LocationID
order by AssetID, LocationID
╔═════════╦════════════╦═══════════╗
║ ASSETID ║ LOCATIONID ║ DAYSATLOC ║
╠═════════╬════════════╬═══════════╣
║ CAR1 ║ LOC1 ║ 18 ║
║ CAR1 ║ LOC99 ║ 4 ║
║ CAR2 ║ LOC1 ║ 18 ║
║ CAR2 ║ LOC99 ║ 4 ║
╚═════════╩════════════╩═══════════╝
性能警告 - 设n为每个资产的最大移动次数,m为资产数量。分析函数版本应具有m *(n log n)的Big-O性能。纯SQL版本应该具有m *(n * n)的Big-O。因此,如果您跟踪一个恒定的资产池,但随着时间的推移添加越来越多的移动(导致每个资产的移动次数稳步增加),查询将变得指数级变慢。如果您在很长一段时间内查询,并且为单个资产记录了数百或数千个移动,则可能需要按月批量计算,然后对这些结果求和。也就是说,如果你拥有庞大的资产池,并且每个资产的移动相对较少,那么纯SQL版本的性能应与Analytic Function版本相当。
- 编辑1:修复了原始SQL解决方案(额外的问题)
的拼写错误- 编辑2:扩展解决方案以支持日期范围 - 还稍微调整输入数据以验证解决方案的稳健性。
create table t1 (
AssetID varchar(10),
LocationID varchar(10),
SublocationID varchar(10),
MoveDate datetime,
primary key (AssetId, MoveDate));
insert into t1
select 'CAR1', 'LOC1', 'SUB1', '01/01/2015 00:00:00' union
select 'CAR1', 'LOC1', 'SUB2', '01/03/2015 03:03:03' union
select 'CAR1', 'LOC1', 'SUB1', '01/04/2015 04:04:04' union
select 'CAR1', 'LOC99', 'SUB99', '01/05/2015 05:05:05' union
select 'CAR1', 'LOC1', 'SUB1' , '01/09/2015 09:09:09' union
select 'CAR2', 'LOC1', 'SUB2', '01/03/2015 03:03:03' union
select 'CAR2', 'LOC1', 'SUB1', '01/04/2015 04:04:04' union
select 'CAR2', 'LOC99', 'SUB99', '01/05/2015 05:05:05' union
select 'CAR2', 'LOC1', 'SUB1' , '01/09/2015 09:09:09' union
select 'CAR3', 'LOC2', 'SUB1' , '01/15/2015 15:15:15'
;
╔═════════╦════════════╦═══════════════╦════════════════════════════════╗
║ ASSETID ║ LOCATIONID ║ SUBLOCATIONID ║ MOVEDATE ║
╠═════════╬════════════╬═══════════════╬════════════════════════════════╣
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 01 2015 00:00:00+0000 ║
║ CAR1 ║ LOC1 ║ SUB2 ║ January, 03 2015 03:03:03+0000 ║
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 04 2015 04:04:04+0000 ║
║ CAR1 ║ LOC99 ║ SUB99 ║ January, 05 2015 05:05:05+0000 ║
║ CAR1 ║ LOC1 ║ SUB1 ║ January, 09 2015 09:09:09+0000 ║
║ CAR2 ║ LOC1 ║ SUB2 ║ January, 03 2015 03:03:03+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 04 2015 04:04:04+0000 ║
║ CAR2 ║ LOC99 ║ SUB99 ║ January, 05 2015 05:05:05+0000 ║
║ CAR2 ║ LOC1 ║ SUB1 ║ January, 09 2015 09:09:09+0000 ║
║ CAR3 ║ LOC2 ║ SUB1 ║ January, 15 2015 15:15:15+0000 ║
╚═════════╩════════════╩═══════════════╩════════════════════════════════╝
当然,你不需要为dt_ranges使用表格,我只是这样做同时测试各种条件。我更喜欢按照[currentstart,nextstart]来处理日期范围,因为编写不会重叠的SQL变得容易得多,例如:月报。
create table dt_range
(thisStartDate date,
nextStartDate date,
primary key (thisStartDate,nextStartDate));
insert into dt_range
select '01-dec-2014','01-jan-2015' union
select '01-jan-2015','01-feb-2015' union
select '02-jan-2015','09-jan-2015' union
select '01-feb-2015','01-mar-2015' ;
╔═══════════════╦═══════════════╗
║ THISSTARTDATE ║ NEXTSTARTDATE ║
╠═══════════════╬═══════════════╣
║ 2014-12-01 ║ 2015-01-01 ║
║ 2015-01-01 ║ 2015-02-01 ║
║ 2015-01-02 ║ 2015-01-09 ║
║ 2015-02-01 ║ 2015-03-01 ║
╚═══════════════╩═══════════════╝
查询:
select thisStartDate, nextStartDate, t.AssetID, ArrivalLocation,
round(sum(datediff(ss,ArrivalTime, DepartureTime))/(24.0*60*60),1) DaysAtLoc
from (
select thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime,
coalesce(min(MoveDate),nextStartDate) DepartureTime
from (
select assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.assetID,
coalesce(ArrivalLocation,InitialLocation) ArrivalLocation,
coalesce(ArrivalTime,assetsInRange.thisStartDate) ArrivalTime
from
(
select thisStartDate, nextStartDate, assetID
from dt_range
join t1 on MoveDate < nextStartDate
group by thisStartDate, nextStartDate, assetID
) assetsInRange
left outer join
(
select thisStartDate, nextStartDate, assetID,
max(MoveDate) precedingDtRangeMoveDt
from dt_range
join t1
on MoveDate < thisStartDate
group by thisStartDate, nextStartDate, assetID
)
precedingMoveDt
on (assetsInRange.assetID = precedingMoveDt.assetID)
left outer join
(
select AssetID, MoveDate precedingDtRangeMoveDt, LocationID initialLocation
from t1
)
precedingMoveLoc
on (precedingMoveDt.assetID = precedingMoveLoc.AssetID
and precedingMoveDt.precedingDtRangeMoveDt = precedingMoveLoc.precedingDtRangeMoveDt)
left outer join
(
select AssetId, LocationId ArrivalLocation, MoveDate ArrivalTime
from t1
)
arrivals
on assetsInRange.AssetID = arrivals.AssetId
and ArrivalTime >= assetsInRange.thisStartDate
and ArrivalTime < assetsInRange.nextStartDate
group by assetsInRange.thisStartDate, assetsInRange.nextStartDate, assetsInRange.AssetId,
coalesce(ArrivalLocation,InitialLocation) ,
coalesce(ArrivalTime,assetsInRange.thisStartDate)
) t
left join t1 on t.assetID = t1.assetID
and t1.MoveDate > ArrivalTime
and t1.MoveDate < nextStartDate
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation, ArrivalTime
) t
group by thisStartDate, nextStartDate, t.AssetID, ArrivalLocation
order by 1, 3;
结果:
╔═══════════════╦═══════════════╦═════════╦═════════════════╦═══════════╗
║ THISSTARTDATE ║ NEXTSTARTDATE ║ ASSETID ║ ARRIVALLOCATION ║ DAYSATLOC ║
╠═══════════════╬═══════════════╬═════════╬═════════════════╬═══════════╣
║ 2015-01-01 ║ 2015-02-01 ║ CAR1 ║ LOC1 ║ 26.8 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR1 ║ LOC99 ║ 4.2 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR2 ║ LOC1 ║ 24.7 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR2 ║ LOC99 ║ 4.2 ║
║ 2015-01-01 ║ 2015-02-01 ║ CAR3 ║ LOC2 ║ 16.4 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR1 ║ LOC1 ║ 2.1 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR1 ║ LOC99 ║ 3.8 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR2 ║ LOC1 ║ 2.1 ║
║ 2015-01-02 ║ 2015-01-09 ║ CAR2 ║ LOC99 ║ 3.8 ║
║ 2015-02-01 ║ 2015-03-01 ║ CAR1 ║ LOC1 ║ 28 ║
║ 2015-02-01 ║ 2015-03-01 ║ CAR2 ║ LOC1 ║ 28 ║
║ 2015-02-01 ║ 2015-03-01 ║ CAR3 ║ LOC2 ║ 28 ║
╚═══════════════╩═══════════════╩═════════╩═════════════════╩═══════════╝
注意 - 我假设资产的第一条记录表明它之前在任何位置都不存在...所以2014年12月 - 2015年1月的测试月份没有显示在结果中,因为没有资产2014年搬家日期。