鉴于此数据集:
ID Name City Birthyear
1 Egon Spengler New York 1957
2 Mac Taylor New York 1955
3 Sarah Connor Los Angeles 1959
4 Jean-Luc Picard La Barre 2305
5 Ellen Ripley Nostromo 2092
6 James T. Kirk Riverside 2233
7 Henry Jones Chicago 1899
我需要找到3个最老的人,但每个城市只有一个。
如果它只是三个最老的,那就是......
然而,由于Egon Spengler和Mac Taylor都位于纽约,Egon Spengler将退出,而下一个(Sarah Connor /洛杉矶)将会进入。
任何优雅的解决方案?
更新
目前,PConroy的变体是最佳/最快的解决方案:
SELECT P.*, COUNT(*) AS ct
FROM people P
JOIN (SELECT MIN(Birthyear) AS Birthyear
FROM people
GROUP by City) P2 ON P2.Birthyear = P.Birthyear
GROUP BY P.City
ORDER BY P.Birthyear ASC
LIMIT 10;
他对“IN”的原始查询对于大数据集来说极其缓慢(在5分钟后中止),但是将子查询移动到JOIN会加快它的速度。约需0.15秒。我的测试环境中有1 mio行。我有一个关于“City,Birthyear”的索引,还有一个关于“Birthyear”的索引。
注意:这与......有关。
答案 0 :(得分:18)
可能不是最优雅的解决方案,IN
的性能可能会受到更大的表格的影响。
嵌套查询获得每个城市的最小Birthyear
。只有具有此Birthyear
的记录才会在外部查询中匹配。按年龄排序然后限制为3个结果可以让你找到3个最老的人,他们也是他们城市中最老的人(Egon Spengler退出..)
SELECT Name, City, Birthyear, COUNT(*) AS ct
FROM table
WHERE Birthyear IN (SELECT MIN(Birthyear)
FROM table
GROUP by City)
GROUP BY City
ORDER BY Birthyear DESC LIMIT 3;
+-----------------+-------------+------+----+
| name | city | year | ct |
+-----------------+-------------+------+----+
| Henry Jones | Chicago | 1899 | 1 |
| Mac Taylor | New York | 1955 | 1 |
| Sarah Connor | Los Angeles | 1959 | 1 |
+-----------------+-------------+------+----+
修改 - 将GROUP BY City
添加到外部查询,因为具有相同出生年份的人将返回多个值。对外部查询进行分组可确保每个城市只返回一个结果,如果多个人具有该最小值Birthyear
。 ct
列将显示城市中是否存在多个人Birthyear
答案 1 :(得分:3)
这可能不是最优雅,最快捷的解决方案,但应该可行。我期待看到真正的数据库大师的解决方案。
select p.* from people p,
(select city, max(age) as mage from people group by city) t
where p.city = t.city and p.age = t.mage
order by p.age desc
答案 2 :(得分:2)
那样的东西?
SELECT
Id, Name, City, Birthyear
FROM
TheTable
WHERE
Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)
答案 3 :(得分:1)
不漂亮,但也应该与具有相同dob的多个人一起工作:
测试数据:
select id, name, city, dob
into people
from
(select 1 id,'Egon Spengler' name, 'New York' city , 1957 dob
union all select 2, 'Mac Taylor','New York', 1955
union all select 3, 'Sarah Connor','Los Angeles', 1959
union all select 4, 'Jean-Luc Picard','La Barre', 2305
union all select 5, 'Ellen Ripley','Nostromo', 2092
union all select 6, 'James T. Kirk','Riverside', 2233
union all select 7, 'Henry Jones','Chicago', 1899
union all select 8, 'Blah','New York', 1955) a
查询:
select
*
from
people p
left join people p1
ON
p.city = p1.city
and (p.dob > p1.dob and p.id <> p1.id)
or (p.dob = p1.dob and p.id > p1.id)
where
p1.id is null
order by
p.dob
答案 4 :(得分:1)
@BlaM
<强>已更新强> 刚发现使用USING代替ON很好。它将删除结果中的重复列。
SELECT P.*, COUNT(*) AS ct
FROM people P
JOIN (SELECT City, MIN(Birthyear) AS Birthyear
FROM people
GROUP by City) P2 USING(Birthyear, City)
GROUP BY P.City
ORDER BY P.Birthyear ASC
LIMIT 10;
原始帖子
嗨,我已经尝试使用您更新的查询但我得到了错误的结果,直到我添加了额外的条件加入(也加入选择的额外列)。转移到您的查询,我使用这个:
SELECT P.*, COUNT(*) AS ct
FROM people P
JOIN (SELECT City, MIN(Birthyear) AS Birthyear
FROM people
GROUP by City) P2 ON P2.Birthyear = P.Birthyear AND P2.City = P.City
GROUP BY P.City
ORDER BY P.Birthyear ASC
LIMIT 10;
理论上你不应该需要最后的GROUP BY P.City,但我现在把它留在那里,以防万一。可能会在以后删除它。