SQL:从两个历史表构建时间轴的最佳方法

时间:2012-06-14 19:59:36

标签: sql tsql

考虑以下内容:

CREATE TABLE Members (MemberID INT)
INSERT Members VALUES (1001)

CREATE TABLE PCPs (PCPID INT)
INSERT PCPs VALUES (231)
INSERT PCPs VALUES (327)
INSERT PCPs VALUES (390)

CREATE TABLE Plans (PlanID INT)
INSERT Plans VALUES (555)
INSERT Plans VALUES (762)

CREATE TABLE MemberPCP (
    MemberID INT
    , PCP INT
    , StartDate DATETIME
    , EndDate DATETIME)
INSERT MemberPCP VALUES (1001, 231, '2002-01-01', '2002-06-30')
INSERT MemberPCP VALUES (1001, 327, '2002-07-01', '2003-05-31')
INSERT MemberPCP VALUES (1001, 390, '2003-06-01', '2003-12-31')

CREATE TABLE MemberPlans (
    MemberID INT
    , PlanID INT
    , StartDate DATETIME
    , EndDate DATETIME)
INSERT MemberPlans VALUES (1001, 555, '2002-01-01', '2003-03-31')
INSERT MemberPlans VALUES (1001, 762, '2003-04-01', '2003-12-31')

我正在寻找一种简洁的方法来构建成员/ PCP /计划关系的时间表,其中PCP或成员计划的更改将导致结果中的单独的开始/结束行。例如,如果超过几年,一名成员改变了他们的PCP两次并且他们的计划一次,但每次在不同的日期,我会看到如下内容:

MemberID  PCP  PlanID  StartDate    EndDate
1001      231  555     2002-01-01   2002-06-30
1001      327  555     2002-07-01   2003-03-31
1001      327  762     2003-04-01   2003-05-31
1001      390  762     2003-06-01   2003-12-31

正如您所看到的,我需要为每个日期期间单独的结果行,这些行涉及成员/ PCP /计划关联的差异。我有一个解决方案,但它在WHERE子句中有很多CASE语句和条件逻辑。我只是觉得有一种更简单的方法可以做到这一点。

感谢。

4 个答案:

答案 0 :(得分:2)

与T-SQL兼容。我同意Glenn的一般方法。

另一个建议:如果您允许在您的业务期间之间进行跳转,则此代码需要进一步调整。否则,我认为从下一条记录的StartDate中推迟EndDate值会更好地从代码中获得更多受控行为。在这种情况下,您希望在数据到达此查询之前确保规则。

编辑:刚刚从Andriy M的帖子中了解了With语句和SQL Fiddle。你也可以see my answer at SQL Fiddle

编辑:修正了Andriy指出的错误。

WITH StartDates AS (
SELECT MemberId, StartDate FROM MemberPCP UNION
SELECT MemberId, StartDate FROM MemberPlans UNION
SELECT MemberId, EndDate + 1 FROM MemberPCP UNION
SELECT MemberId, EndDate + 1 FROM MemberPlans
),
EndDates AS (
SELECT MemberId, EndDate = StartDate - 1 FROM MemberPCP UNION
SELECT MemberId, StartDate - 1 FROM MemberPlans UNION
SELECT MemberId, EndDate FROM MemberPCP UNION
SELECT MemberId, EndDate FROM MemberPlans
),
Periods AS (
SELECT s.MemberId, s.StartDate, EndDate = min(e.EndDate)
  FROM StartDates s
       INNER JOIN EndDates e
           ON s.StartDate <= e.EndDate
          AND s.MemberId = e.MemberId
 GROUP BY s.MemberId, s.StartDate
)
SELECT MemberId = p.MemberId,
       pcp.PCP, pl.PlanId,
       p.StartDate, p.EndDate
  FROM Periods p
       LEFT JOIN MemberPCP pcp
           -- because of the way we divided period,
           -- there will be one and only one record that fits this join clause
           ON p.StartDate >= pcp.StartDate
          AND p.EndDate <= pcp.EndDate
          AND p.MemberId = pcp.MemberId
       LEFT JOIN MemberPlans pl
           ON p.StartDate >= pl.StartDate
          AND p.EndDate <= pl.EndDate
          AND p.MemberId = pl.MemberId
 ORDER BY p.MemberId, p.StartDate

答案 1 :(得分:1)

可能不是最有效但最简单直接的解决方案,我会做以下事情:

  • 1)扩大范围;

  • 2)加入扩展范围;

  • 3)对结果进行分组。

当然,这假设只使用了日期(即两个表中每个00:00StartDate的时间部分为EndDate

要扩展日期范围,我更喜欢使用numbers table,如下所示:

SELECT
  m.MemberID,
  m.PCP,
  Date = DATEADD(DAY, n.Number, m.StartDate)
FROM MemberPCP m
  INNER JOIN Numbers n
    ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate)

同样适用于MemberPlans

要生成组合的行集,我会使用FULL JOIN,但如果您事先知道两个表都涵盖完全相同的时间段,INNER JOIN也可以这样做:

SELECT *
FROM MemberPCPExpanded pcp
  FULL JOIN MemberPlansExpanded plans
    ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date

现在,您只需对结果行进行分组,并找到(MemberID, PCP, PlanID)的每个组合的最小和最大日期:

SELECT
  MemberID  = ISNULL(pcp.MemberID, plans.MemberID),,
  pcp.PCP,
  plans.PlanID,
  StartDate = MIN(ISNULL(pcp.Date, plans.Date)),
  EndDate   = MAX(ISNULL(pcp.Date, plans.Date))
FROM MemberPCPExpanded pcp
  FULL JOIN MemberPlansExpanded plans
    ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date
GROUP BY
  ISNULL(pcp.MemberID, plans.MemberID),
  pcp.PCP,
  plans.PlanID

请注意,如果您使用INNER JOIN代替FULL JOIN,则不需要所有ISNULL()个表达式,只需选择表格的列,例如pcp.MemberID代替ISNULL(pcp.MemberID, plans.MemberID)pcp.Date代替ISNULL(pcp.Date, plans.Date)

完整的查询可能如下所示:

WITH MemberPCPExpanded AS (
  SELECT
    m.MemberID,
    m.PCP,
    Date = DATEADD(DAY, n.Number, m.StartDate)
  FROM MemberPCP m
    INNER JOIN Numbers n
      ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate)
),
MemberPlansExpanded AS (
  SELECT
    m.MemberID,
    m.PlanID,
    Date = DATEADD(DAY, n.Number, m.StartDate)
  FROM MemberPlans m
    INNER JOIN Numbers n
      ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate)
)
SELECT
  MemberID  = ISNULL(pcp.MemberID, plans.MemberID),
  pcp.PCP,
  plans.PlanID,
  StartDate = MIN(ISNULL(pcp.Date, plans.Date)),
  EndDate   = MAX(ISNULL(pcp.Date, plans.Date))
FROM MemberPCPExpanded pcp
  FULL JOIN MemberPlansExpanded plans
    ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date
GROUP BY
  ISNULL(pcp.MemberID, plans.MemberID),
  pcp.PCP,
  plans.PlanID
ORDER BY
  MemberID,
  StartDate

您可以尝试此查询at SQL Fiddle

答案 2 :(得分:0)

我的方法是将每个成员的开始日期的唯一组合作为起点,然后从那里构建查询的其他部分:

--
-- Traverse down a list of 
-- unique Member ID and StartDates
-- 
-- For each row find the most 
-- recent PCP for that member 
-- which started on or before
-- the start date of the current
-- row in the traversal
--
-- For each row find the most 
-- recent PlanID for that member
-- which started on or before
-- the start date of the current
-- row in the traversal
-- 
-- For each row find the earliest
-- end date for that member
-- (from a collection of unique
-- member end dates) that happened
-- after the start date of the
-- current row in the traversal
-- 
SELECT MemberID,
  (SELECT TOP 1 PCP 
   FROM MemberPCP 
   WHERE MemberID = s.MemberID 
   AND StartDate <= s.StartDate 
   ORDER BY StartDate DESC
  ) AS PCP,
  (SELECT TOP 1 PlanID 
   FROM MemberPlans 
   WHERE MemberID = s.MemberID 
   AND StartDate <= s.StartDate 
   ORDER BY StartDate DESC
  ) AS PlanID,
  StartDate,  
  (SELECT TOP 1 EndDate 
   FROM (
    SELECT MemberID, EndDate 
    FROM MemberPlans 
    UNION 
    SELECT MemberID, EndDate 
    FROM MemberPCP) e
   WHERE EndDate >= s.StartDate 
   ORDER BY EndDate
  ) AS EndDate
FROM ( 
  SELECT
    MemberID,
    StartDate
  FROM MemberPlans
  UNION 
  SELECT
    MemberID,
    Startdate
  FROM MemberPCP
) s
ORDER BY StartDate

答案 3 :(得分:0)

也许这会为一个开始提供一些想法:

SELECT y.memberid, y.pcp, z.planid, x.startdate, x.enddate
  FROM (
        WITH startdates AS (

            SELECT startdate FROM memberpcp
            UNION
            SELECT startdate FROM memberplans
            UNION
            SELECT enddate + 1 FROM memberpcp
            UNION
            SELECT enddate + 1 FROM memberplans

            ), enddates AS (
            SELECT enddate FROM memberpcp
            UNION
            SELECT enddate FROM memberplans

          )

        SELECT s.startdate, e.enddate
          FROM startdates s 
              ,enddates e
          WHERE e.enddate = (SELECT MIN(enddate)
                               FROM enddates
                               WHERE enddate > s.startdate)
       ) x
       ,memberpcp y
       ,memberplans z

  WHERE (y.startdate, y.enddate) = (SELECT startdate, enddate FROM memberpcp WHERE startdate <= x.startdate AND enddate >= x.enddate)
    AND (z.startdate, z.enddate) = (SELECT startdate, enddate FROM memberplans WHERE startdate <= x.startdate AND enddate >= x.enddate)

我在Oracle上运行了这些结果:

1001    231 555 01-JAN-02   30-JUN-02
1001    327 555 01-JUL-02   31-MAR-03
1001    327 762 01-APR-03   31-MAY-03
1001    390 762 01-JUN-03   31-DEC-03

我们的想法是先定义不同的日期范围。这是在“WITH”子句中。然后对其他表中的每个范围进行查找。这里有很多关于重叠范围等的假设,但也许是一个开始。我尝试在没有分析函数的情况下查看这个,因为可能没有很好的支持tsql的分析函数?我不知道。在构建真实日期范围时,范围也需要由memberid构建。