基于最近值的SQL Server在两个表之间连接

时间:2015-08-20 18:12:20

标签: sql-server join

我有两组数据,如下所示。 表1:

enter code here 
ID     Value_1
1      233.67
2      83.28
3      84.49
4      1234.83

表2:

NewID  Value_3     Value_4
5      NULL        83
6      NULL        85
7      NULL        235

我想以这样的方式加入这两个表,结果数据集如下所示。

ID     NewID     Value_1     Value_2
1      7         233.67      235
2      5         83.28       83
3      6         84.49       85
4      NULL      1234.83     NULL

我知道使用ROUND命令会导致将来出现问题。你们中的任何人都知道如何创建上面的结果集吗?

2 个答案:

答案 0 :(得分:0)

这样的事情:

DECLARE @Table1 TABLE
(
    Id int,
    Value1 float
)
INSERT INTO @Table1
VALUES
(1,      233.67),
(2,      83.28),
(3,      84.49),
(4,      1234.83)

DECLARE @Table2 TABLE
(
    NewId int,
    Value2 float
)

INSERT INTO @Table2
VALUES
(5,     83),
(6,     85),
(7,    235)

SELECT DISTINCT 
    t1.Id,
    FIRST_VALUE(t2.NewId) OVER (PARTITION BY t1.Id ORDER BY ABS(t1.Value1 - t2.Value2) ASC) AS NewId,
    t1.Value1,
    FIRST_VALUE(t2.Value2) OVER (PARTITION BY t1.Id ORDER BY ABS(t1.Value1 - t2.Value2) ASC) AS Value2
FROM @Table1 t1
CROSS JOIN @Table2 t2
ORDER BY t1.Id

但是你也会得到ID = 4的结果。

答案 1 :(得分:0)

这种方法避免了交叉连接,这需要测试每个组合。它也适用于SQL Server 2005及更高版本。它的工作原理是计算Table2中每条记录的中点之间的下限和上限,然后加入Table1,其中Value_1位于中点之间。如果Value_1落在边界上,则此代码将向上舍入(选择要匹配的Value_4中的较高者)。

--Load Table1
select 1 ID, convert(float,233.67) Value_1 into #Table1
insert into #Table1 select 2, 83.28
insert into #Table1 select 3, 84.49
insert into #Table1 select 4, 1234.83

--Load Table2
select 5 NewID, null Value_3, convert(float,83) Value_4 into #Table2
insert into #Table2 select 6, NULL, 85
insert into #Table2 select 7, NULL, 235

;with cte_Table2 as
(
select *, ROW_NUMBER() over (order by Value_4) OrderNum
from #Table2
)
Select #Table1.ID, 
    NewTable2.NewID, 
    #Table1.Value_1,
    NewTable2.Value_4 Value_2
from #Table1
full join
    (
    select Table2.NewID, 
        Table2.Value_3, 
        Table2.Value_4, 
        Table2Prev.Value_4 + (Table2.Value_4 - Table2Prev.Value_4) / 2.0 LowerBound, 
        Table2.Value_4 + (Table2Next.Value_4 - Table2.Value_4) / 2.0 UpperBound
    from cte_Table2 Table2
    left join cte_Table2 Table2Prev
    on Table2.OrderNum = Table2Prev.OrderNum + 1
    left join cte_Table2 Table2Next
    on Table2.OrderNum = Table2Next.OrderNum - 1
    ) NewTable2
on (#Table1.Value_1 < UpperBound or UpperBound is null)
    and (#Table1.Value_1 >= LowerBound or LowerBound is null)
order by 1

我注意到您的预期输出中没有显示ID为4的匹配项。如果您需要某种范围来排除匹配,那么您必须将该限制添加到联接的ON条件中。例如,您可以通过确保使用FULL JOIN并将其添加到ON条件来排除超出阈值10的值:

and abs(#Table1.Value_1-NewTable2.Value_4) < 10.0

虽然可能你想要做的是一种联接,Table1中的值不仅与Table2中最接近的值匹配, Table2中的值也是 Table1中最接近的值。在这种情况下,您必须使用边界条件为Table1构建子查询,然后检查它是否也是最接近的匹配。像这样:

;with cte_Table1 as
(
select *, ROW_NUMBER() over (order by Value_1) OrderNum
from #Table1
), 
cte_Table2 as
(
select *, ROW_NUMBER() over (order by Value_4) OrderNum
from #Table2
)
Select NewTable1.ID, 
    NewTable2.NewID, 
    NewTable1.Value_1,
    NewTable2.Value_4 Value_2
from
    (
    select Table1.ID,
        Table1.Value_1, 
        Table1Prev.Value_1 + (Table1.Value_1 - Table1Prev.Value_1) / 2.0 LowerBound, 
        Table1.Value_1 + (Table1Next.Value_1 - Table1.Value_1) / 2.0 UpperBound
    from cte_Table1 Table1
    left join cte_Table1 Table1Prev
    on Table1.OrderNum = Table1Prev.OrderNum + 1
    left join cte_Table1 Table1Next
    on Table1.OrderNum = Table1Next.OrderNum - 1
    ) NewTable1
full join
    (
    select Table2.NewID, 
        Table2.Value_3, 
        Table2.Value_4, 
        Table2Prev.Value_4 + (Table2.Value_4 - Table2Prev.Value_4) / 2.0 LowerBound, 
        Table2.Value_4 + (Table2Next.Value_4 - Table2.Value_4) / 2.0 UpperBound
    from cte_Table2 Table2
    left join cte_Table2 Table2Prev
    on Table2.OrderNum = Table2Prev.OrderNum + 1
    left join cte_Table2 Table2Next
    on Table2.OrderNum = Table2Next.OrderNum - 1
    ) NewTable2
on (NewTable1.Value_1 < NewTable2.UpperBound or NewTable2.UpperBound is null)
    and (NewTable1.Value_1 >= NewTable2.LowerBound or NewTable2.LowerBound is null)
    and (NewTable2.Value_4 < NewTable1.UpperBound or NewTable1.UpperBound is null)
    and (NewTable2.Value_4 >= NewTable1.LowerBound or NewTable1.LowerBound is null)
order by 1

现在表中的记录只有在两个方面的值最接近时才会匹配。这将确保Table1中的每条记录与Table2中的最多1个值匹配。因为它是一个完整的连接,所以你可以在任何一方获得空值...取决于哪个表更小。

还有一件事要提到......此代码可能无法按您需要的方式处理重复值。如果您在Value_1Value_4中有多个相同的值,则会找到与其中一个匹配但不同时匹配的值。如果你想要...那么你必须将你的公用表表达式改为:

;with cte_Table1 as
(
select *, ROW_NUMBER() over (order by Value_1) OrderNum
from (select distinct Value_1 from #Table1) tbl1
), 
cte_Table2 as
(
select *, ROW_NUMBER() over (order by Value_4) OrderNum
from (select distinct Value_4 from #Table2) tbl2
)

然后更新子查询以仅输出带边界的值。这将为您提供唯一值之间的最佳匹配。然后,您可以加入Table1Table2 ON Table1.Value_1 = NewTable1.Value1 and Table2.Value_4 = NewTable2.Value_4以获取其他字段。

当然,如果您有非常大的表,可以进行一些优化......比如将这些子查询中的一些分解为索引的临时表。