我有一个包含多列主键(城市/州/日期)和更多数据列的表。我希望得到每个城市/州的最新数据。如何干净/高效地完成这项工作?现在我可以通过执行第一个查询来获取我正在尝试获取的所有行的列表,然后是第二个带有大量WHERE子句的查询来执行此操作:
SELECT state, city, max(date) from data GROUP BY city, state;
+-------+---------------------+------------+
| state | city | MAX(date) |
+-------+---------------------+------------+
| CA | San Francisco | 2013-09-01 |
| CA | Los Angeles | 2013-08-01 |
| NY | New York | 2013-10-01 |
| ... | ... (many rows) ... | ... |
+-------+---------------------+------------+
SELECT * FROM data WHERE
(state = "CA" AND city = "San Francisco" AND date='2013-09-01') OR
(state = "CA" AND city = "Los Angeles" AND date='2013-08-01') OR
(state = "NY" AND city = "New York" AND date='2013-10-01') OR
...
这真的很丑陋且效率低下,如果第一个查询返回很多行,我的第二个查询可能会太长。显然,如果我有一个单列主键,我可以使用带有IN()的子选择,但这在这里是不可能的。有什么建议吗?
更新:我用一个subselect尝试了Bill的建议,但它没有使用任何键,而是永远。如果我将subselect限制为仅返回5行,则返回0.64s。如果我让它返回所有73个城市/州组合,则需要很长时间(查询仍在运行)。
EXPLAIN SELECT * FROM data WHERE (city, state, date) IN (SELECT state, city, MAX(date) FROM data GROUP BY city, state)
+----+--------------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| 1 | PRIMARY | data | ALL | NULL | NULL | NULL | NULL | 13342 | Using where |
| 2 | DEPENDENT SUBQUERY | data | index | NULL | PRIMARY | 57 | NULL | 8058 | Using index |
+----+--------------------+-------+-------+---------------+---------+---------+------+-------+-------------+
答案 0 :(得分:4)
MySQL支持元组比较:
SELECT * FROM data WHERE
(state, city, date) IN (
('CA', 'San Francisco', '2013-09-01'),
('CA', 'Los Angeles', '2013-08-01'),
('NY', 'New York', '2013-10-01'));
答案 1 :(得分:4)
我认为这应该适合你:
select
*
from
data t1
natural join
(
select
city,
state,
max(date) as date
from
data
group by
city,
state
) t2;