如何从table2中提取最接近table1的DATE FIELD的数据?

时间:2013-11-07 13:35:36

标签: sql sql-server tsql sql-server-2012

我有两个表格Diagnose&行使 我想提取最接近Diagnose_Date的锻炼日期,它应该是锻炼表中的1行。

我已尝试在条件

的情况下使用DATEDIFF函数进行左连接
SELECT D.ID,D.Diagnose_Date,D.Type1,D.Type2,E.Exercise_Date],E.Field1,E.Field2,E.Field3
FROM Diagnose D
LEFT JOIN Exercise E
ON D.ID=E.ID
WHERE DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date]) BETWEEN -30 AND 30

任何帮助都会非常有帮助

先谢谢


诊断表

------------------------------------------
ID     Dignose_Date     Type1    SubType1    
------------------------------------------
1      10/01/2010       01       1.1
2      20/02/2012       02       2.2
3      30/03/2013       01       1.2
------------------------------------------

练习表

------------------------------------------
ID     Exercise_Date  Field1  Field2  Field3
------------------------------------------
1      01/01/2010        x       y      z
2      10/02/2012        a       b      c
2      01/04/2012        e       f      f
3      01/03/2013        x       y      z
3      05/04/2013        a       b      c
3      01/06/2013        x       y      z
------------------------------------------

预期结果应为:

------------------------------------------------------------------------
ID  Diagnose_Date  Exercise_Date Type1 SubType2  Field1  Field2  Field3
------------------------------------------------------------------------
1   10/01/2010     01/01/2010     01    1.1         x       y        z
2   20/02/2012     10/02/2012     02    2.2         a       b        c
3   30/03/2013     05/04/2013     01    1.2         a       b        c
-------------------------------------------------------------------------

4 个答案:

答案 0 :(得分:2)

首先,在CTE中,对于每次诊断,获得诊断日期与诊断相关的所有锻炼日期之间的最小时间间隔。

WITH MIN_DATES_CTE(ID, DATE_DIFF)
AS (
    SELECT ID, MIN(ABS(DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date])))
    FROM Exercise E
    INNER JOIN Diagnose D ON D.ID = E.ID
    GROUP BY E.ID
)

然后,按ID和最小时间间隔加入诊断和练习

SELECT D.ID,D.Diagnose_Date,D.Type1,D.Type2,E.Exercise_Date],E.Field1,E.Field2,E.Field3
FROM Diagnose D
LEFT JOIN Exercise E ON D.ID = E.ID
INNER JOIN MIN_DATES_CTE ON MIN_DATES_CTE.ID = E.ID
WHERE ABS(DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date])) = MIN_DATES_CTE.DATE_DIFF

答案 1 :(得分:1)

我假设您只是将任何单个诊断条目与任何单个运动条目匹配,基于它们彼此最接近的日期。

这是我的思路:
对诊断和练习进行全面JOIN,按绝对日期差异排序,升序。

SELECT
    D.ID,
    D.Date,
    E.ID,
    E.Date,
    ABS(DATEDIFF(day, D.Date, E.Date)) Diff

FROM Diagnosis D, Exercise E
ORDER BY Diff

你会得到这样的结果:

ID  Date        ID  Date        Diff
3   2013-03-30  5   2013-03-25  5
2   2012-02-20  2   2012-02-10  10
3   2013-03-30  4   2013-03-01  29
2   2012-02-20  3   2012-04-01  41
3   2013-03-30  6   2013-06-01  63
1   2010-10-01  1   2010-01-01  273
3   2013-03-30  3   2012-04-01  363
2   2012-02-20  4   2013-03-01  375
2   2012-02-20  5   2013-03-25  399
3   2013-03-30  2   2012-02-10  414
2   2012-02-20  6   2013-06-01  467
1   2010-10-01  2   2012-02-10  497
1   2010-10-01  3   2012-04-01  548
2   2012-02-20  1   2010-01-01  780
1   2010-10-01  4   2013-03-01  882
1   2010-10-01  5   2013-03-25  906
1   2010-10-01  6   2013-06-01  974
3   2013-03-30  1   2010-01-01  1184

现在,您可以看到彼此最接近的日期,以及它们远的天数。

当然,你不会使用它,但是从这个列表中,你可以选择第一个:

SELECT TOP 1
    D.ID,
    D.Date,
    E.ID,
    E.Date,
    ABS(DATEDIFF(day, D.Date, E.Date)) Diff

FROM Diagnosis D, Exercise E
ORDER BY Diff

现在您可以在LEFT联接中插入此语句,这样您就可以单独选择与其他人匹配的日期。
像这样:

SELECT
    fD.ID,
    fD.Date,
    fE.ID,
    fE.Date
FROM
    Diagnosis fD
    LEFT JOIN Exercise fE
        ON fE.ID = (SELECT TOP 1 E.ID
                        FROM Diagnosis D, Exercise E
                        WHERE D.ID = fD.ID
                        ORDER BY ABS(DATEDIFF(day, D.Date, E.Date)))

结果如下:

ID  Date        ID  Date
1   2010-10-01  1   2010-01-01
2   2012-02-20  2   2012-02-10
3   2013-03-30  5   2013-03-25

答案 2 :(得分:1)

您可以使用OUTER APPLY

SELECT  d.ID, 
        d.Diagnose_Date, 
        d.Type1, 
        d.SubType1, 
        e.Exercise_Date, 
        e.Field1, 
        e.Field2, 
        e.Field3
FROM    Diagnose d
        OUTER APPLY
        (   SELECT  TOP 1 Exercise_Date, Field1, Field2, Field3
            FROM    Exercise e
            WHERE   d.ID = e.ID
            AND     DATEDIFF(DAY, d.[Diagnose_Date], e.[Exercise_Date]) BETWEEN -30 AND 30
            ORDER BY ABS(DATEDIFF(DAY, d.[Diagnose_Date], e.[Exercise_Date])) 
        ) e;

<强> Example on SQL Fiddle

我对此做了更多测试,发现使用ROW_NUMBER()的方法效率最高:

WITH CTE AS
(   SELECT  d.ID,
            d.Diagnose_Date,
            d.Type1,
            d.SubType1, 
            e.Exercise_Date,
            e.Field1,
            e.Field2,
            e.Field3,
            RowNumber = ROW_NUMBER() OVER (PARTITION BY d.ID ORDER BY ABS(DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date])))
    FROM    Diagnose D
            LEFT JOIN Exercise E 
                ON D.ID = E.ID
)
SELECT  ID,
        Diagnose_Date,
        Type1,
        SubType1, 
        EID = ID,
        Exercise_Date,
        Field1,
        Field2,
        Field3
FROM    CTE
WHERE   RowNumber = 1;

我将此与我的第一个解决方案进行了比较,并将答案与最多的投票进行了比较。结果如下:

外部申请

Cost relative to batch: 34%
--------------------------------------------------
Table 'Exercise'. Scan count 3, logical reads 3
Table 'Diagnose'. Scan count 1, logical reads 1
--------------------------------------------------
Total. Scan count 4, logical reads 4

与AGGREGATES一起自我加入(迄今为止投票最多)

Cost relative to batch: 51%
--------------------------------------------------
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Exercise'. Scan count 2, logical reads 4
Table 'Diagnose'. Scan count 2, logical reads 2
--------------------------------------------------
Total. Scan count 4, logical reads 6

<强> ROW_NUMBER()

Cost relative to batch: 15%
--------------------------------------------------
Table 'Exercise'. Scan count 1, logical reads 3
Table 'Diagnose'. Scan count 1, logical reads 1
--------------------------------------------------
Total. Scan count 2, logical reads 4

<强> Examples on SQL Fiddle

因此ROW_NUMBER解决方案具有最低的IO统计信息和最低的估算成本

答案 3 :(得分:0)

仅使用标准SQL:

SELECT D.ID, D.Diagnose_Date, D.Type1, D.SubType1, E.Exercise_Date, E.Field1, E.Field2, E.Field3
FROM Diagnose D
LEFT JOIN Exercise E
ON E.ID=D.ID AND
   E.Exercise_Date=(SELECT MAX(Exercise_Date) FROM Exercise WHERE Exercise.ID=D.ID AND Exercise.Exercise_Date<=D.Diagnose_Date)