如何将一个表中的最新行连接到另一个表?

时间:2009-01-30 22:31:35

标签: sql date join greatest-n-per-group

我的数据如下:

entities
id         name
1          Apple
2          Orange
3          Banana

定期运行流程并为每个实体提供分数。该过程生成数据并将其添加到分数表中,如下所示:

scores 
id  entity_id    score   date_added
1    1            10       1/2/09
2    2            10       1/2/09
3    1            15       1/3/09
4    2            10       1/03/09
5    1            15       1/4/09
6    2            15       1/4/09
7    3            22       1/4/09

我希望能够选择所有实体以及每个实体的最新记录得分,从而产生如下数据:

entities
id name     score  date_added
1  Apple     15     1/4/09
2  Orange    15     1/4/09
3  Banana    15     1/4/09

我可以使用此查询获取单个实体的数据:

SELECT entities.*, 
       scores.score, 
       scores.date_added 
FROM entities

INNER  JOIN scores
ON entities.id = scores.entity_id

WHERE entities.id = ?

ORDER BY scores.date_added DESC
LIMIT 1

但我对如何为所有实体选择相同而感到茫然。也许它正盯着我看?

非常感谢你花时间。

感谢您的回复。我会花几天时间看看首选解决方案是否会起泡然后我会选择答案。

更新:我已经尝试了几个建议的解决方案,我现在面临的主要问题是,如果一个实体还没有生成的分数,它们就不会出现在列表中。

SQL会是什么样的,以确保返回所有实体,即使它们尚未发布任何分数?

更新:答案已选中。谢谢大家!

7 个答案:

答案 0 :(得分:63)

我是这样做的:

SELECT e.*, s1.score, s1.date_added 
FROM entities e
  INNER JOIN scores s1
    ON (e.id = s1.entity_id)
  LEFT OUTER JOIN scores s2
    ON (e.id = s2.entity_id AND s1.id < s2.id)
WHERE s2.id IS NULL;

答案 1 :(得分:9)

只是为了添加我的变体:

SELECT e.*, s1.score
FROM entities e
INNER JOIN score s1 ON e.id = s1.entity_id
WHERE NOT EXISTS (
    SELECT 1 FROM score s2 WHERE s2.id > s1.id
)

答案 2 :(得分:5)

接近1

SELECT entities.*, 
       scores.score, 
       scores.date_added 
FROM entities

INNER  JOIN scores
ON entities.id = scores.entity_id

WHERE scores.date_added = 
  (SELECT max(date_added) FROM scores where entity_id = entities.id)

答案 3 :(得分:3)

接近2

相对于批次的查询成本:


SELECT entities.*, 
       scores.score, 
       scores.date_added 
FROM entities

INNER  JOIN scores
ON entities.id = scores.entity_id

inner join 
    (
    SELECT 
           entity_id, max(date_added) as recent_date
    FROM scores
    group by entity_id
    ) as y on entities.id = y.entity_id and scores.date_added = y.recent_date

答案 4 :(得分:3)

我知道这是一个老问题,我想我会添加一个尚未提及的方法,Cross ApplyOuter Apply。这些在SQL Server 2005中可用(数据库类型未在此问题中标记)或更高

使用临时表

DECLARE @Entities TABLE(Id INT PRIMARY KEY, name NVARCHAR(MAX))
INSERT INTO @Entities
VALUES (1, 'Apple'), (2, 'Orange'), (3, 'Banana'), (4, 'Cherry')

DECLARE @Scores TABLE(Id INT PRIMARY KEY, Entity_Id INT, Score INT, Date_Added DATE)
INSERT INTO @Scores
VALUES (1,1,10,'2009-02-01'),
(2,2,10,'2009-02-01'),
(3,1,15,'2009-02-01'),
(4,2,10,'2009-03-01'),
(5,1,15,'2009-04-01'),
(6,2,15,'2009-04-01'),
(7,3,22,'2009-04-01')

您可以使用

SELECT E.Id, E.name, S.Score, S.Date_Added 
FROM @Entities E
CROSS APPLY
(
    SELECT TOP 1 * 
    FROM @Scores Sc 
    WHERE Sc.Entity_Id = E.Id  
    ORDER BY sc.Score DESC
) AS S

获得理想的结果。允许没有分数的实体的等值是

SELECT E.Id, E.name, S.Score, S.Date_Added 
FROM @Entities E
OUTER APPLY
(
    SELECT TOP 1 * 
    FROM @Scores Sc 
    WHERE Sc.Entity_Id = E.Id  
    ORDER BY sc.Score DESC
) AS S

答案 5 :(得分:1)

SELECT entities.*, 
       scores.score, 
       scores.date_added 
FROM entities

INNER  JOIN scores
ON entities.id = scores.entity_id

WHERE entities.id in 
(select id from scores s2 where date_added = max(date_added) and s2.id = entities.id)

ORDER BY scores.date_added DESC
LIMIT 1

答案 6 :(得分:1)

您现在也可以在大多数RDBMS(Oracle,PostgreSQL,SQL Server)中使用ROW_NUMBER等窗口函数进行自然查询:

SELECT id, name, score, date_added FROM (
 SELECT e.id, e.name, s.score, s.date_added,
 ROW_NUMBER() OVER (PARTITION BY e.id ORDER BY s.date_added DESC) rn
 FROM Entities e INNER JOIN Scores s ON e.id = s.entity_id
) tmp WHERE rn = 1;

SQL Fiddle