所以我有两张桌子。一个跟踪一个人的位置,一个跟随工作人员的轮班。
工作人员有工作人员职位,地点,开始和结束时间以及该班次的费用。
人们有eventId,stayId,personId,location,start和end time。一个人将有多次入住的活动。
我想要做的是将这两个表格结合在一起,这样我就可以准确地报告每个住宿地点的费用,根据住宿的持续时间乘以当时覆盖该地点的工作人员的相关费用。
我遇到的问题是:
我目前的方法是扩展两个表以获得每分钟的记录。因此,下午1点到下午2点之间的停留将有60条记录,而5小时的工作人员轮班将有300条记录。然后,我可以在该分钟将所有在该位置工作的工作人员根据每个工作人员的成本除以其班次的持续时间得到一分钟值,并将该值应用于另一个表中的相应记录。
使用的技术:
我发现这个过程非常慢,因为我可以想象,因为我的人员表在扩展到分钟级别时有大约5亿条记录,而员工表在完成相同的事情时大约有3500万条记录。 / p>
人们可以建议我使用更好的方法吗?
示例数据: 位置
| EventId | ID | Person | Loc | Start | End
| 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00
| 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00
| 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00
| 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00
员工
| Loc | StaffID | Cost | Start | End
| 1 | 99 | 40 | May, 20 2015 04:00:00 | May, 20 2015 12:00:00
| 1 | 15 | 85 | May, 20 2015 03:00:00 | May, 20 2015 5:00:00
| 3 | 85 | 74 | May, 20 2015 18:00:00 | May, 20 2015 20:00:00
| 4 | 10 | 36 | May, 20 2015 06:00:00 | May, 20 2015 14:00:00
结果
| EventId | ID | Person | Loc | Start | End | Cost
| 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00 | 45.50
| 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00 | 81.20
| 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00 | 95.00
| 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00 | 14.75
SQL: 数字表
;WITH x AS
(
SELECT TOP (224) object_id FROM sys.all_objects
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.object_id)
INTO #numbers
FROM x CROSS JOIN x AS y
ORDER BY n
员工表
SELECT
Location,
ISNULL(SUM(ROUND(Cost/ CASE WHEN (DateDiff(MINUTE, StartDateTime, EndDateTime)) = 0 THEN 1 ELSE (DateDiff(MINUTE, StartDateTime, EndDateTime)) END, 5)),0) AS MinuteCost,
Count(Name) AS StaffCount,
RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, StartDateTime) + n.n -1, 0)
INTO #temp_StaffRoster
FROM dbo.StaffRoster
分组在一起,在需要帮助的地方我想
INSERT INTO dbo.FinalTable
SELECT [EventId]
,[Id]
,[Start]
,[End]
,event.[Location]
,SUM(ISNULL(MinuteCost,1)/ISNULL(PeopleCount, 1)) AS Cost
,AVG(ISNULL(StaffCount,1)) AS AvgStaff
FROM dbo.Events event WITH (NOLOCK)
INNER JOIN #numbers n ON n.n BETWEEN 0 AND DATEDIFF(MINUTE, Start, End)
LEFT OUTER JOIN #temp_StaffRoster staff WITH (NOLOCK) ON staff.Location= event.Location AND staff.RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
LEFT OUTER JOIN (SELECT [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0) AS Mins, COUNT(Id) as PeopleCount
FROM dbo.Events WITH (NOLOCK)
INNER JOIN #numbers n ON n.n BETWEEN 0 AND DATEDIFF(MINUTE, Start, End)
GROUP BY [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
) cap ON cap.Location= event.LocationAND cap.Mins = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
GROUP BY [EventId]
,[Id]
,[Start]
,[End]
,event.[Location]
更新
所以我有两张桌子。一个跟踪一个人的位置,一个跟随工作人员的成本变化。我正在尝试合并这两个表来计算每个位置停留的成本。
这是我的方法:
;;WITH stay AS
(
SELECT TOP 650000
StayId,
Location,
Start,
End
FROM stg_Stay
WHERE Loction IS NOT NULL -- Some locations don't currently have a matching shift location
ORDER BY Location, ADTM
),
shift AS
(
SELECT TOP 36000000
Location,
ShiftMinute,
MinuteCost,
StaffCount
FROM stg_Shifts
ORDER BY Location, ShiftMinute
)
SELECT
[StayId],
SUM(MinuteCost) AS Cost,
AVG(StaffCount) AS StaffCount
INTO newTable
FROM stay S
CROSS APPLY (SELECT MinuteCost, StaffCount
FROM shift R
WHERE R.Location = S.Location
AND R.ShiftMinute BETWEEN S.Start AND S.End
) AS Shifts
GROUP BY [StayId]
这就是我所处的位置。
我已将Shifts表拆分为分钟级别,因为没有明确的转换对齐。
stg_Stay包含的列数多于此操作所需的列数。 stg_Shift如图所示。
stg_Shifts上使用的索引:
CREATE NONCLUSTERED INDEX IX_Shifts_Loc_Min
ON dbo.stg_Shifts (Location, ShiftMinute)
INCLUDE (MinuteCost, StaffCount);
on stg_Stay
CREATE INDEX IX_Stay_StayId ON dbo.stg_Stay (StayId);
CREATE CLUSTERED INDEX IX_Stay_Start_End_Loc ON dbo.stg_Stay (Location,Start,End);
由于Shifts有大约3600万条记录而且Stays有大约650k,所以我能做些什么来使这个表现更好?
答案 0 :(得分:1)
SELECT *
FROM Locations l
OUTER APPLY -- Assume a staff won't appear in different location in the same period of time, of course.
(
SELECT
CONVERT(decimal(14,2), SUM(CostPerMinute * OverlappedMinutes)) AS ActualCost,
COUNT(DISTINCT StaffId) AS StaffCount,
SUM(OverlappedMinutes) AS StaffMinutes
FROM
(
SELECT
*,
-- Calculate overlapped time in minutes
DATEDIFF(MINUTE,
CASE WHEN StartTime > l.StartTime THEN StartTime ELSE l.StartTime END, -- Get greatest start time
CASE WHEN EndTime > l.EndTime THEN l.EndTime ELSE EndTime END -- Get least end time
) AS OverlappedMinutes,
Cost / DATEDIFF(MINUTE, StartTime, EndTime) AS CostPerMinute
FROM Staff
WHERE LocationId = l.LocationId
AND StartTime <= l.EndTime AND l.StartTime <= EndTime -- Match with overlapped time
) data
) StaffInLoc
答案 1 :(得分:0)
因为你的命名太可怕了,所以拿下一粒盐。
位置应该是一个逗留,因为我猜位置是另一个定义单个物理位置的表。
您的员工表名称也很糟糕。为什么不把它命名为Shift。我希望员工表能够包含名称,电话等内容。其中Shift表可以包含同一员工的多个班次等。
其次我认为你错过了两个表之间的关系。
如果您仅在位置和重叠日期时间加入位置和工作人员,我认为这对您尝试做的事情没有多大意义。你怎么知道在给定时间内哪个员工在任何地方?您只能对位置和重叠日期做些事情,假设位置表中的条目与在时间范围内在该位置有班次的每个员工相关。因此,请将下面的内容更多地作为解决问题的灵感,以及如何找到重叠的日期时间间隔,而不是像问题的实际解决方案,因为我认为您的数据和模型状况不佳。
如果我弄错了,请在您的桌子上提供主键和外键以及更好的解释。
一些虚拟数据
DROP TABLE dbo.Location
CREATE TABLE dbo.Location
(
StayId INT,
EventId INT,
PersonId INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 987 ,1 ,123 ,1 ,'2015-05-20T07:00:00','2015-05-20T08:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 374 ,1 ,123 ,4 ,'2015-05-20T08:00:00','2015-05-20T10:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 184 ,1 ,123 ,3 ,'2015-05-20T10:00:00','2015-05-20T11:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 798 ,1 ,123 ,8 ,'2015-05-20T11:00:00','2015-05-20T12:00:00')
DROP TABLE dbo.Staff
CREATE TABLE Staff
(
StaffId INT,
Cost INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 99 ,40 ,1 ,'2015-05-20T04:00:00','2015-05-20T12:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 15 ,85 ,1 ,'2015-05-20T03:00:00','2015-05-20T05:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 85 ,74 ,3 ,'2015-05-20T18:00:00','2015-05-20T20:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 10 ,36 ,4 ,'2015-05-20T06:00:00','2015-05-20T14:00:00')
实际查询
WITH OnLocation AS
(
SELECT
L.StayId, L.EventId, L.LocationId, L.PersonId, S.Cost
, IIF(L.StartTime > S.StartTime, L.StartTime, S.StartTime) AS OnLocationStartTime
, IIF(L.EndTime < S.EndTime, L.EndTime, S.EndTime) AS OnLocationEndTime
FROM dbo.Location L
LEFT JOIN dbo.Staff S
ON S.LocationId = L.LocationId -- TODO are you not missing a join condition on staffid
-- Detects any overlaps between stays and shifts
AND L.StartTime <= S.EndTime AND L.EndTime >= S.StartTime
)
SELECT
*
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) AS DurationMinutes
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) / 60.0 * Cost AS DurationCost
FROM OnLocation D
要获得摘要,您可以查询并添加GROUP BY,无论您想要总结什么。