我有下表。
+--------------------+--------------+-------+
Date | SymbolNumber | Value
+--------------------+--------------+-------+
2018-08-31 15:00:00 | 123 | data
2018-09-31 15:00:00 | 456 | data
2018-09-31 15:00:00 | 123 | data
2018-09-31 15:00:00 | 555 | data
2018-10-31 15:00:00 | 555 | data
2018-10-31 15:00:00 | 231 | data
2018-10-31 15:00:00 | 123 | data
2018-11-31 15:00:00 | 123 | data
2018-11-31 15:00:00 | 555 | data
2018-12-31 15:00:00 | 123 | data
2018-12-31 15:00:00 | 555 | data
我需要一个查询,可以选择查询中陈述的每个SymbolNumber的最后一行。
SELECT
*
FROM
MyTable
WHERE
symbolNumber IN (123, 555)
AND
**lastOfRow ordered by latest-date**
预期结果:
2018-12-31 15:00:00 | 123 | data
2018-12-31 15:00:00 | 555 | data
我该怎么做?
答案 0 :(得分:0)
首先,您将需要一个查询,以获取每个symbolNumber
的最新日期。其次,您可以inner join
到此表(使用date
)来获取其余的列。像这样:
SELECT
t.*
FROM
<table_name> AS t
INNER JOIN
(SELECT
symbolNumber,
MAX(date) AS maxDate
FROM
<table_name>
GROUP BY
symbolNumber) AS latest_date ON latest_date.symbolNumber = t.symbolNumber AND latest_date.maxDate = t.date
上一个查询将获取表上每个现有symbolNumber
的最新数据。如果要限制为symbolNumbers: 123 and 555
,则需要进行下一个修改:
SELECT
t.*
FROM
<table_name> AS t
INNER JOIN
(SELECT
symbolNumber,
MAX(date) AS maxDate
FROM
<table_name>
WHERE
symbolNumber IN (123, 555)
GROUP BY
symbolNumber) AS latest_date ON latest_date.symbolNumber = t.symbolNumber AND latest_date.maxDate = t.date
答案 1 :(得分:0)
symbolNumber
上进行“自我左联接”,并与右侧具有较高Date
值的同一组中的其他行进行匹配。这是一种解决方案避免子查询,并利用Left Join
:
SELECT t1.*
FROM MyTable AS t1
LEFT JOIN MyTable AS t2
ON t2.symbolNumber = t1.symbolNumber AND
t2.Date > t1.Date -- Joining to a row in same group with higher date
WHERE t1.symbolNumber IN (123, 555) AND
t2.symbolNumber IS NULL -- Higher date not found; so this is highest row
编辑:
基准研究,比较了Left Join
方法与衍生表(子查询)的对比
@Strawberry 在5.6.21中进行了一些基准测试。这就是他的发现...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,dense_user INT NOT NULL
,sparse_user INT NOT NULL
);
INSERT INTO my_table (dense_user,sparse_user)
SELECT RAND()*100,RAND()*100000;
INSERT INTO my_table (dense_user,sparse_user)
SELECT RAND()*100,RAND()*100000 FROM my_table;
-- REPEAT THIS LINE A FEW TIMES !!!
SELECT COUNT(DISTINCT dense_user) dense
, COUNT(DISTINCT sparse_user) sparse
, COUNT(*) total
FROM my_table;
+-------+--------+---------+
| dense | sparse | total |
+-------+--------+---------+
| 101 | 99999 | 1048576 |
+-------+--------+---------+
ALTER TABLE my_table ADD INDEX(dense_user);
ALTER TABLE my_table ADD INDEX(sparse_user);
--dense_test
SELECT x.*
FROM my_table x
LEFT
JOIN my_table y
ON y.dense_user = x.dense_user
AND y.id < x.id
WHERE y.id IS NULL
ORDER
BY dense_user
LIMIT 10;
+------+------------+-------------+
| id | dense_user | sparse_user |
+------+------------+-------------+
| 1212 | 0 | 1950 |
| 153 | 1 | 23193 |
| 255 | 2 | 27472 |
| 28 | 3 | 86440 |
| 18 | 4 | 47886 |
| 291 | 5 | 76563 |
| 15 | 6 | 85049 |
| 16 | 7 | 78384 |
| 135 | 8 | 52304 |
| 62 | 9 | 40930 |
+------+------------+-------------+
10 rows in set (2.64 sec)
SELECT x.*
FROM my_table x
JOIN
( SELECT dense_user, MIN(id) id FROM my_table GROUP BY dense_user ) y
ON y.dense_user = x.dense_user
AND y.id = x.id
ORDER
BY dense_user
LIMIT 10;
+------+------------+-------------+
| id | dense_user | sparse_user |
+------+------------+-------------+
| 1212 | 0 | 1950 |
| 153 | 1 | 23193 |
| 255 | 2 | 27472 |
| 28 | 3 | 86440 |
| 18 | 4 | 47886 |
| 291 | 5 | 76563 |
| 15 | 6 | 85049 |
| 16 | 7 | 78384 |
| 135 | 8 | 52304 |
| 62 | 9 | 40930 |
+------+------------+-------------+
10 rows in set (0.05 sec)
Uncorrelated query is 50 times faster.
--sparse test
SELECT x.*
FROM my_table x
LEFT
JOIN my_table y
ON y.sparse_user = x.sparse_user
AND y.id < x.id
WHERE y.id IS NULL
ORDER
BY sparse_user
LIMIT 10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 165055 | 75 | 0 |
| 37598 | 63 | 1 |
| 170596 | 70 | 2 |
| 46142 | 87 | 3 |
| 33546 | 21 | 4 |
| 323114 | 87 | 5 |
| 86592 | 96 | 6 |
| 156711 | 36 | 7 |
| 17148 | 62 | 8 |
| 139965 | 71 | 9 |
+--------+------------+-------------+
10 rows in set (0.03 sec)
SELECT x.*
FROM my_table x
JOIN ( SELECT sparse_user, MIN(id) id FROM my_table GROUP BY sparse_user ) y
ON y.sparse_user = x.sparse_user
AND y.id = x.id
ORDER
BY sparse_user
LIMIT 10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 165055 | 75 | 0 |
| 37598 | 63 | 1 |
| 170596 | 70 | 2 |
| 46142 | 87 | 3 |
| 33546 | 21 | 4 |
| 323114 | 87 | 5 |
| 86592 | 96 | 6 |
| 156711 | 36 | 7 |
| 17148 | 62 | 8 |
| 139965 | 71 | 9 |
+--------+------------+-------------+
10 rows in set (4.73 sec)
Exclusion Join is 150 times faster
However, as you move further up the result set, the picture begins to change very dramatically...
SELECT x.*
FROM my_table x
JOIN ( SELECT sparse_user, MIN(id) id FROM my_table GROUP BY sparse_user ) y
ON y.sparse_user = x.sparse_user
AND y.id = x.id
ORDER
BY sparse_user
LIMIT 10000,10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 9810 | 93 | 10000 |
| 162438 | 4 | 10001 |
| 467371 | 62 | 10002 |
| 8258 | 13 | 10003 |
| 297049 | 17 | 10004 |
| 68354 | 23 | 10005 |
| 192701 | 64 | 10006 |
| 176225 | 92 | 10007 |
| 156595 | 37 | 10008 |
| 318266 | 1 | 10009 |
+--------+------------+-------------+
10 rows in set (9.17 sec)
SELECT x.*
FROM my_table x
LEFT
JOIN my_table y
ON y.sparse_user = x.sparse_user
AND y.id < x.id
WHERE y.id IS NULL
ORDER
BY sparse_user
LIMIT 10000,10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 9810 | 93 | 10000 |
| 162438 | 4 | 10001 |
| 467371 | 62 | 10002 |
| 8258 | 13 | 10003 |
| 297049 | 17 | 10004 |
| 68354 | 23 | 10005 |
| 192701 | 64 | 10006 |
| 176225 | 92 | 10007 |
| 156595 | 37 | 10008 |
| 318266 | 1 | 10009 |
+--------+------------+-------------+
10 rows in set (32.19 sec) -- !!!
总而言之,在某些有限的情况下,排除联接(所谓的“草莓查询”可以(显着)更快。更一般而言,不相关的查询会更快。