如何从截止日期的列中每个给定的行值中获取最后一行?

时间:2018-11-02 03:25:48

标签: mysql

我有下表。

+--------------------+--------------+-------+
Date                 | SymbolNumber | Value
+--------------------+--------------+-------+
 2018-08-31 15:00:00 | 123          | data
 2018-09-31 15:00:00 | 456          | data
 2018-09-31 15:00:00 | 123          | data
 2018-09-31 15:00:00 | 555          | data
 2018-10-31 15:00:00 | 555          | data
 2018-10-31 15:00:00 | 231          | data
 2018-10-31 15:00:00 | 123          | data
 2018-11-31 15:00:00 | 123          | data
 2018-11-31 15:00:00 | 555          | data
 2018-12-31 15:00:00 | 123          | data
 2018-12-31 15:00:00 | 555          | data

我需要一个查询,可以选择查询中陈述的每个SymbolNumber的最后一行。

SELECT
    *
FROM
    MyTable
WHERE
    symbolNumber IN (123, 555)
AND
    **lastOfRow ordered by latest-date**

预期结果:

 2018-12-31 15:00:00 | 123 | data
 2018-12-31 15:00:00 | 555 | data

我该怎么做?

2 个答案:

答案 0 :(得分:0)

首先,您将需要一个查询,以获取每个symbolNumber的最新日期。其次,您可以inner join到此表(使用date)来获取其余的列。像这样:

SELECT
    t.*
FROM
   <table_name> AS t
INNER JOIN
    (SELECT
        symbolNumber,
        MAX(date) AS maxDate
    FROM
        <table_name>
    GROUP BY
       symbolNumber) AS latest_date ON latest_date.symbolNumber = t.symbolNumber AND latest_date.maxDate = t.date

上一个查询将获取表上每个现有symbolNumber的最新数据。如果要限制为symbolNumbers: 123 and 555,则需要进行下一个修改:

SELECT
    t.*
FROM
   <table_name> AS t
INNER JOIN
    (SELECT
        symbolNumber,
        MAX(date) AS maxDate
    FROM
        <table_name>
    WHERE
        symbolNumber IN (123, 555)
    GROUP BY
       symbolNumber) AS latest_date ON latest_date.symbolNumber = t.symbolNumber AND latest_date.maxDate = t.date

答案 1 :(得分:0)

  • 我们可以在symbolNumber上进行“自我左联接”,并与右侧具有较高Date值的同一组中的其他行进行匹配。
  • 我们最终将仅考虑那些找不到较高日期的行(这意味着当前行属于该组中的最高日期)。

这是一种解决方案避免子查询,并利用Left Join

SELECT t1.* 
FROM MyTable AS t1 
LEFT JOIN MyTable AS t2 
  ON t2.symbolNumber = t1.symbolNumber AND 
     t2.Date > t1.Date -- Joining to a row in same group with higher date
WHERE t1.symbolNumber IN (123, 555) AND 
      t2.symbolNumber IS NULL  -- Higher date not found; so this is highest row

编辑:

基准研究,比较了Left Join方法与衍生表(子查询)的对比

@Strawberry 在5.6.21中进行了一些基准测试。这就是他的发现...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,dense_user INT NOT NULL
,sparse_user INT NOT NULL
);

INSERT INTO my_table (dense_user,sparse_user) 
SELECT RAND()*100,RAND()*100000;

INSERT INTO my_table (dense_user,sparse_user)
SELECT RAND()*100,RAND()*100000 FROM my_table;
-- REPEAT THIS LINE A FEW TIMES !!!

SELECT COUNT(DISTINCT dense_user) dense
     , COUNT(DISTINCT sparse_user) sparse
     , COUNT(*) total 
  FROM my_table;
+-------+--------+---------+
| dense | sparse | total   |
+-------+--------+---------+
|   101 |  99999 | 1048576 |
+-------+--------+---------+

ALTER TABLE my_table ADD INDEX(dense_user);

ALTER TABLE my_table ADD INDEX(sparse_user);

--dense_test
SELECT x.* 
  FROM my_table x 
  LEFT 
  JOIN my_table y 
    ON y.dense_user = x.dense_user 
   AND y.id < x.id 
 WHERE y.id IS NULL 
 ORDER 
    BY dense_user 
 LIMIT 10;
+------+------------+-------------+
| id   | dense_user | sparse_user |
+------+------------+-------------+
| 1212 |          0 |        1950 |
|  153 |          1 |       23193 |
|  255 |          2 |       27472 |
|   28 |          3 |       86440 |
|   18 |          4 |       47886 |
|  291 |          5 |       76563 |
|   15 |          6 |       85049 |
|   16 |          7 |       78384 |
|  135 |          8 |       52304 |
|   62 |          9 |       40930 |
+------+------------+-------------+
10 rows in set (2.64 sec)

SELECT x.* 
  FROM my_table x 
  JOIN 
     ( SELECT dense_user, MIN(id) id FROM my_table GROUP BY dense_user ) y 
    ON y.dense_user = x.dense_user 
   AND y.id = x.id 
 ORDER 
    BY dense_user 
 LIMIT 10;
+------+------------+-------------+
| id   | dense_user | sparse_user |
+------+------------+-------------+
| 1212 |          0 |        1950 |
|  153 |          1 |       23193 |
|  255 |          2 |       27472 |
|   28 |          3 |       86440 |
|   18 |          4 |       47886 |
|  291 |          5 |       76563 |
|   15 |          6 |       85049 |
|   16 |          7 |       78384 |
|  135 |          8 |       52304 |
|   62 |          9 |       40930 |
+------+------------+-------------+
10 rows in set (0.05 sec)

Uncorrelated query is 50 times faster.

--sparse test
SELECT x.* 
  FROM my_table x 
  LEFT 
  JOIN my_table y 
    ON y.sparse_user = x.sparse_user 
   AND y.id < x.id 
 WHERE y.id IS NULL 
 ORDER 
    BY sparse_user 
 LIMIT 10;
+--------+------------+-------------+
| id     | dense_user | sparse_user |
+--------+------------+-------------+
| 165055 |         75 |           0 |
|  37598 |         63 |           1 |
| 170596 |         70 |           2 |
|  46142 |         87 |           3 |
|  33546 |         21 |           4 |
| 323114 |         87 |           5 |
|  86592 |         96 |           6 |
| 156711 |         36 |           7 |
|  17148 |         62 |           8 |
| 139965 |         71 |           9 |
+--------+------------+-------------+
10 rows in set (0.03 sec)

SELECT x.* 
  FROM my_table x 
  JOIN ( SELECT sparse_user, MIN(id) id FROM my_table GROUP BY sparse_user ) y 
    ON y.sparse_user = x.sparse_user 
   AND y.id = x.id 
 ORDER 
    BY sparse_user 
 LIMIT 10;
+--------+------------+-------------+
| id     | dense_user | sparse_user |
+--------+------------+-------------+
| 165055 |         75 |           0 |
|  37598 |         63 |           1 |
| 170596 |         70 |           2 |
|  46142 |         87 |           3 |
|  33546 |         21 |           4 |
| 323114 |         87 |           5 |
|  86592 |         96 |           6 |
| 156711 |         36 |           7 |
|  17148 |         62 |           8 |
| 139965 |         71 |           9 |
+--------+------------+-------------+
10 rows in set (4.73 sec)

Exclusion Join is 150 times faster

However, as you move further up the result set, the picture begins to change very dramatically...

SELECT x.* 
  FROM my_table x 
  JOIN ( SELECT sparse_user, MIN(id) id FROM my_table GROUP BY sparse_user ) y 
    ON y.sparse_user = x.sparse_user 
   AND y.id = x.id 
 ORDER 
    BY sparse_user 
 LIMIT 10000,10; 
+--------+------------+-------------+
| id     | dense_user | sparse_user |
+--------+------------+-------------+
|   9810 |         93 |       10000 |
| 162438 |          4 |       10001 |
| 467371 |         62 |       10002 |
|   8258 |         13 |       10003 |
| 297049 |         17 |       10004 |
|  68354 |         23 |       10005 |
| 192701 |         64 |       10006 |
| 176225 |         92 |       10007 |
| 156595 |         37 |       10008 |
| 318266 |          1 |       10009 |
+--------+------------+-------------+
10 rows in set (9.17 sec)

SELECT x.* 
  FROM my_table x 
  LEFT 
  JOIN my_table y 
    ON y.sparse_user = x.sparse_user 
   AND y.id < x.id 
 WHERE y.id IS NULL 
 ORDER 
    BY sparse_user 
 LIMIT 10000,10;
+--------+------------+-------------+
| id     | dense_user | sparse_user |
+--------+------------+-------------+
|   9810 |         93 |       10000 |
| 162438 |          4 |       10001 |
| 467371 |         62 |       10002 |
|   8258 |         13 |       10003 |
| 297049 |         17 |       10004 |
|  68354 |         23 |       10005 |
| 192701 |         64 |       10006 |
| 176225 |         92 |       10007 |
| 156595 |         37 |       10008 |
| 318266 |          1 |       10009 |
+--------+------------+-------------+
10 rows in set (32.19 sec) -- !!!

总而言之,在某些有限的情况下,排除联接(所谓的“草莓查询”可以(显着)更快。更一般而言,不相关的查询会更快。