合并两个sql表的最佳方法

时间:2015-07-09 04:47:01

标签: sql-server tsql datetime

所以我有两张桌子。一个跟踪一个人的位置,一个跟随工作人员的轮班。

工作人员有工作人员职位,地点,开始和结束时间以及该班次的费用。

人们有eventId,stayId,personId,location,start和end time。一个人将有多次入住的活动。

我想要做的是将这两个表格结合在一起,这样我就可以准确地报告每个住宿地点的费用,根据住宿的持续时间乘以当时覆盖该地点的工作人员的相关费用。

我遇到的问题是:

  1. 位置停留与员工轮班不一致。即一个人可能在下午1点到下午2点之间,4个工作人员可能在12:30到1:30之间轮班,2个从1:30到5点。
  2. 有很多记录。
  3. 并非所有员工都获得相同的支付
  4. 我目前的方法是扩展两个表以获得每分钟的记录。因此,下午1点到下午2点之间的停留将有60条记录,而5小时的工作人员轮班将有300条记录。然后,我可以在该分钟将所有在该位置工作的工作人员根据每个工作人员的成本除以其班次的持续时间得到一分钟值,并将该值应用于另一个表中的相应记录。

    使用的技术:

    1. 我创建了一个包含50,000个数字的表,因为有些停留时间可能很长 长。
    2. 我拿着员工表并加入数字表,将每个人分开 转移。然后根据位置和分钟将它组合在一起 工作人员数和分钟费用。
    3. 最后一步,以及造成问题的那一步,是我采取的 位置表,加入数字,以及修改后的员工 表,以产生该分钟的成本。我也数了数 在该位置的人员占到覆盖多个人的工作人员 人。
    4. 我发现这个过程非常慢,因为我可以想象,因为我的人员表在扩展到分钟级别时有大约5亿条记录,而员工表在完成相同的事情时大约有3500万条记录。 / p>

      人们可以建议我使用更好的方法吗?

      示例数据: 位置

      | EventId |  ID | Person | Loc |          Start         |         End
      |  1      | 987 |  123   |  1  | May, 20 2015 07:00:00 | May, 20 2015 08:00:00 
      |  1      | 374 |  123   |  4  | May, 20 2015 08:00:00 | May, 20 2015 10:00:00 
      |  1      | 184 |  123   |  3  | May, 20 2015 10:00:00 | May, 20 2015 11:00:00 
      |  1      | 798 |  123   |  8  | May, 20 2015 11:00:00 | May, 20 2015 12:00:00 
      

      员工

      | Loc | StaffID | Cost |         Start         |         End
      |  1  | 99      |  40  | May, 20 2015 04:00:00 | May, 20 2015 12:00:00 
      |  1  | 15      |  85  | May, 20 2015 03:00:00 | May, 20 2015 5:00:00 
      |  3  | 85      |  74  | May, 20 2015 18:00:00 | May, 20 2015 20:00:00 
      |  4  | 10      |  36  | May, 20 2015 06:00:00 | May, 20 2015 14:00:00 
      

      结果

      | EventId | ID | Person | Loc | Start | End | Cost | 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00 | 45.50 | 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00 | 81.20 | 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00 | 95.00 | 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00 | 14.75

      SQL: 数字表

      ;WITH x AS 
      (
        SELECT TOP (224) object_id  FROM sys.all_objects 
      )
      SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.object_id) 
      INTO #numbers
      FROM x CROSS JOIN x AS y 
      ORDER BY n
      

      员工表

      SELECT 
          Location,
          ISNULL(SUM(ROUND(Cost/ CASE WHEN (DateDiff(MINUTE, StartDateTime, EndDateTime)) = 0 THEN 1 ELSE (DateDiff(MINUTE, StartDateTime, EndDateTime)) END, 5)),0) AS MinuteCost,
          Count(Name) AS StaffCount,
          RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, StartDateTime) + n.n -1,     0) 
      INTO #temp_StaffRoster
      FROM dbo.StaffRoster
      

      分组在一起,在需要帮助的地方我想

          INSERT INTO dbo.FinalTable
          SELECT [EventId]
                ,[Id]
                ,[Start]
                ,[End]
                ,event.[Location]
                ,SUM(ISNULL(MinuteCost,1)/ISNULL(PeopleCount, 1)) AS Cost
                ,AVG(ISNULL(StaffCount,1)) AS AvgStaff
            FROM dbo.Events event WITH (NOLOCK) 
            INNER JOIN #numbers n ON n.n BETWEEN 0 AND  DATEDIFF(MINUTE, Start, End)
            LEFT OUTER JOIN #temp_StaffRoster staff WITH (NOLOCK) ON staff.Location= event.Location AND staff.RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0)
            LEFT OUTER JOIN (SELECT [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0) AS Mins, COUNT(Id) as PeopleCount
                             FROM dbo.Events WITH (NOLOCK) 
                             INNER JOIN #numbers n ON n.n BETWEEN 0 AND  DATEDIFF(MINUTE, Start, End)
                             GROUP BY [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0)
                             ) cap ON cap.Location= event.LocationAND cap.Mins = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 ,     0)
      
            GROUP BY [EventId]
                    ,[Id]
                    ,[Start]
                    ,[End]
                    ,event.[Location]
      

      更新

      所以我有两张桌子。一个跟踪一个人的位置,一个跟随工作人员的成本变化。我正在尝试合并这两个表来计算每个位置停留的成本。

      这是我的方法:

      ;;WITH stay AS
      (
          SELECT TOP 650000
              StayId,
              Location,
              Start,
              End
          FROM stg_Stay
          WHERE Loction IS NOT NULL  -- Some locations don't currently have a matching shift location
          ORDER BY Location, ADTM
      ),
      shift AS
      (
          SELECT TOP 36000000
              Location,
              ShiftMinute,
              MinuteCost,
              StaffCount
          FROM stg_Shifts
          ORDER BY Location, ShiftMinute
      )
      
      SELECT 
          [StayId],
          SUM(MinuteCost) AS Cost,
          AVG(StaffCount) AS StaffCount
      INTO newTable
      FROM stay S
      CROSS APPLY (SELECT MinuteCost, StaffCount
                      FROM shift R 
                      WHERE R.Location = S.Location
                       AND R.ShiftMinute BETWEEN S.Start AND S.End 
                  ) AS Shifts
      GROUP BY [StayId]
      

      这就是我所处的位置。

      我已将Shifts表拆分为分钟级别,因为没有明确的转换对齐。

      stg_Stay包含的列数多于此操作所需的列数。 stg_Shift如图所示。

      stg_Shifts上使用的索引:

      CREATE NONCLUSTERED INDEX IX_Shifts_Loc_Min
      ON dbo.stg_Shifts (Location, ShiftMinute)
      INCLUDE (MinuteCost, StaffCount); 
      

      on stg_Stay

      CREATE INDEX IX_Stay_StayId ON dbo.stg_Stay (StayId);
      CREATE CLUSTERED INDEX IX_Stay_Start_End_Loc ON dbo.stg_Stay (Location,Start,End); 
      

      由于Shifts有大约3600万条记录而且Stays有大约650k,所以我能做些什么来使这个表现更好?

2 个答案:

答案 0 :(得分:1)

  1. 请勿按分钟划分行。
  2. 如果您可以在它们之间建立快速关系,则临时表可能会有所帮助。即重叠间隔
  3. SELECT * 
    FROM Locations l
    OUTER APPLY -- Assume a staff won't appear in different location in the same period of time, of course.
    (
      SELECT 
        CONVERT(decimal(14,2), SUM(CostPerMinute * OverlappedMinutes)) AS ActualCost,
        COUNT(DISTINCT StaffId) AS StaffCount,
        SUM(OverlappedMinutes) AS StaffMinutes
      FROM
      (
        SELECT 
          *,
          -- Calculate overlapped time in minutes
          DATEDIFF(MINUTE,
            CASE WHEN StartTime > l.StartTime THEN StartTime ELSE l.StartTime END, -- Get greatest start time
            CASE WHEN EndTime > l.EndTime THEN l.EndTime ELSE EndTime END -- Get least end time
          ) AS OverlappedMinutes,
          Cost / DATEDIFF(MINUTE, StartTime, EndTime) AS CostPerMinute
        FROM Staff 
        WHERE LocationId = l.LocationId 
          AND StartTime <= l.EndTime AND l.StartTime <= EndTime -- Match with overlapped time
      ) data
    ) StaffInLoc
    

    SQL Fiddle

答案 1 :(得分:0)

因为你的命名太可怕了,所以拿下一粒盐。

位置应该是一个逗留,因为我猜位置是另一个定义单个物理位置的表。

您的员工表名称也很糟糕。为什么不把它命名为Shift。我希望员工表能够包含名称,电话等内容。其中Shift表可以包含同一员工的多个班次等。

其次我认为你错过了两个表之间的关系。

如果您仅在位置和重叠日期时间加入位置和工作人员,我认为这对您尝试做的事情没有多大意义。你怎么知道在给定时间内哪个员工在任何地方?您只能对位置和重叠日期做些事情,假设位置表中的条目与在时间范围内在该位置有班次的每个员工相关。因此,请将下面的内容更多地作为解决问题的灵感,以及如何找到重叠的日期时间间隔,而不是像问题的实际解决方案,因为我认为您的数据和模型状况不佳。

如果我弄错了,请在您的桌子上提供主键和外键以及更好的解释。

一些虚拟数据

DROP TABLE dbo.Location
CREATE TABLE dbo.Location
(
StayId INT,
EventId INT,
PersonId INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)


INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES  ( 987 ,1 ,123 ,1 ,'2015-05-20T07:00:00','2015-05-20T08:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES  ( 374 ,1 ,123 ,4 ,'2015-05-20T08:00:00','2015-05-20T10:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES  ( 184 ,1 ,123 ,3 ,'2015-05-20T10:00:00','2015-05-20T11:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES  ( 798 ,1 ,123 ,8 ,'2015-05-20T11:00:00','2015-05-20T12:00:00')

DROP TABLE dbo.Staff
CREATE TABLE Staff
(
StaffId INT,
Cost INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)

INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES  ( 99 ,40 ,1 ,'2015-05-20T04:00:00','2015-05-20T12:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES  ( 15 ,85 ,1 ,'2015-05-20T03:00:00','2015-05-20T05:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES  ( 85 ,74 ,3 ,'2015-05-20T18:00:00','2015-05-20T20:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES  ( 10 ,36 ,4 ,'2015-05-20T06:00:00','2015-05-20T14:00:00')

实际查询

WITH OnLocation AS
(
    SELECT 
    L.StayId, L.EventId, L.LocationId, L.PersonId, S.Cost
    , IIF(L.StartTime > S.StartTime, L.StartTime, S.StartTime) AS OnLocationStartTime
    , IIF(L.EndTime < S.EndTime, L.EndTime, S.EndTime) AS OnLocationEndTime      
    FROM dbo.Location L
    LEFT JOIN dbo.Staff S
    ON S.LocationId = L.LocationId  -- TODO are you not missing a join condition on staffid
    -- Detects any overlaps between stays and shifts
    AND L.StartTime <= S.EndTime AND L.EndTime >= S.StartTime
)

SELECT 
*
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) AS DurationMinutes
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) / 60.0 * Cost AS DurationCost
FROM OnLocation D

要获得摘要,您可以查询并添加GROUP BY,无论您想要总结什么。