根据最大ID联接表

时间:2018-11-15 18:57:27

标签: sql sql-server tsql join

我发现了三个似乎都在问类似问题的问题:

Getting max value from rows and joining to another table

Select only rows by join tables max value

Joining tables based on the maximum value

但是我很难把注意力集中在如何准确地联接表上,而当最大值在id或index字段本身中时,如何只保留其中一个表的最大行。

我正在寻找仅需要联接的答案,因为这将使解决方案能够在生成查询的工具中工作,通过该工具可以轻松地生成相应的联接,尽管子查询也可能可行付出更多的努力。我发现以下答案特别有趣:

SELECT DISTINCT b.id, b.filename, a1.name
FROM a a1
JOIN b
  ON b.id = a1.id
LEFT JOIN a a2
  ON a2.id = a1.id
  AND a2.rank > a1.rank
WHERE a2.id IS NULL

但是,在我的情况下,排名列也是索引,例如“ID”。我无法为平等和大于同一时间进行比较,因为它们永远不会是真的!

另外,可能使情况复杂化的是,我需要此功能的典型查询可能会联接多个表(3-5并不少见)。因此,作为我的查询的简化示例:

SELECT
    table1.field1, table1.field2, table1.field3,
    table2.field1, table2.field2, table2.field3,
    table3.field1, table3.field2, table3.field3,
    table4.field1, table4.field2, table4.field3
FROM table1
INNER JOIN table2 ON
    table1.field1 = table2.field1
    AND table1.field2 = table2.field2
    AND table2.field3 < 0
INNER JOIN table3 ON
    table2.field1 = table3.field1
    AND table2.field4 = table3.field4
INNER JOIN table4 ON
    table1.field1 = table4.field1
    AND table1.field2 = table4.field2

我想做的是通过仅获取具有所有其他字段的每个唯一组合的最大ID(例如MAX(table3.id))的行来消除table3中的重复项。也就是说,以上查询返回的内容如下:

+-------+-------+-------+---------+
| table1| table2| table4|table3   |
+-------+-------+-------+---------+
|  A    |   A   |   A   | 1,...   |
|  A    |   A   |   A   | 2,...   |
|  A    |   A   |   A   | 3,...   |
|  A    |   A   |   A   | MAX2,...|
|  B    |   B   |   B   | 1,...   |
|  B    |   B   |   B   | 2,...   |
|  B    |   B   |   B   | 3,...   |
|  B    |   B   |   B   | MAX2,...|
+-------+-------+-------+---------+

(我仅使用A和B表示我正在针对特定行集合谈论table1,table2和table4中的字段的所有相同值。)

,我想将其简化为:

+-------+-------+-------+---------+
| table1| table2| table4|table3   |
+-------+-------+-------+---------+
|  A    |   A   |   A   | MAX1,...|
|  B    |   B   |   B   | MAX2,...|
+-------+-------+-------+---------+

1 个答案:

答案 0 :(得分:2)

您可以添加派生表,以将TABLE3中的匹配行减少到每组一个。另一种方法将使用窗口函数,但您只要求JOIN

SELECT
    table1.field1, table1.field2, table1.field3,
    table2.field1, table2.field2, table2.field3,
    table3.field1, table3.field2, table3.field3,
    table4.field1, table4.field2, table4.field3
FROM table1
INNER JOIN table2 ON
    table1.field1 = table2.field1
    AND table1.field2 = table2.field2
    AND table2.field3 < 0
INNER JOIN table3 ON
    table2.field1 = table3.field1
    AND table2.field4 = table3.field4

--here is the added derived table. Change column names as needed
INNER JOIN (select UID, ID = max(ID) from Table3 group by UID) x
    on x.UID = table3.UID and x.mx = table3.ID

INNER JOIN table4 ON
    table1.field1 = table4.field1
    AND table1.field2 = table4.field2

或者,也许……如下所示。它确实取决于您的架构,而样本数据很难理解。

INNER JOIN (select field1, field4, mx = max(ID) from Table3 group by field1, field4) x
    on x.field1 = table3.field1 and x.field4 = table3.field4 and x.mx = table3.ID

这里是一个例子。您会注意到最后三列对是相同的。您只需要最后一个,即该分组的max(id)。使行相对于其余数据(不是主键,而是要连接的对象)唯一的原因是,您想要在派生表和连接条件中包含该行。

declare @table table (id int identity(1,1), f1 char(1), f2 char(1))
insert into @table
values
('a','b'),
('a','c'),
('a','a'),
('b','b'),
('b','b'),
('b','b')

select * from @table

select t1.*
from @table t1
inner join 
    (select f1, f2, mx = max(id) from @table group by f1, f2) t2 on
    t1.f1 = t2.f1
    and t1.f2 = t2.f2
    and t1.id = t2.mx