选择具有每个用户最近日期的行

时间:2013-06-11 06:57:25

标签: mysql sql greatest-n-per-group

我有一张表(“lms_attendance”)用户的登记和退房时间如下:

id  user    time    io (enum)
1   9   1370931202  out
2   9   1370931664  out
3   6   1370932128  out
4   12  1370932128  out
5   12  1370933037  in

我正在尝试创建一个这个表的视图,它只输出每个用户ID的最新记录,同时给我“in”或“out”值,如下所示:

id  user    time    io
2   9   1370931664  out
3   6   1370932128  out
5   12  1370933037  in

到目前为止我非常接近,但我意识到观点不会接受subquerys,这使得它变得更加困难。我得到的最接近的查询是:

select 
    `lms_attendance`.`id` AS `id`,
    `lms_attendance`.`user` AS `user`,
    max(`lms_attendance`.`time`) AS `time`,
    `lms_attendance`.`io` AS `io` 
from `lms_attendance` 
group by 
    `lms_attendance`.`user`, 
    `lms_attendance`.`io`

但我得到的是:

id  user    time    io
3   6   1370932128  out
1   9   1370931664  out
5   12  1370933037  in
4   12  1370932128  out

哪个很接近,但并不完美。我知道最后一组不应该在那里,但没有它,它会返回最近的时间,但不会返回它的相对IO值。

有什么想法吗? 谢谢!

12 个答案:

答案 0 :(得分:169)

查询:

<强> SQLFIDDLEExample

SELECT t1.*
FROM lms_attendance t1
WHERE t1.time = (SELECT MAX(t2.time)
                 FROM lms_attendance t2
                 WHERE t2.user = t1.user)

结果:

| ID | USER |       TIME |  IO |
--------------------------------
|  2 |    9 | 1370931664 | out |
|  3 |    6 | 1370932128 | out |
|  5 |   12 | 1370933037 |  in |

每次都能解决的解决方案:

<强> SQLFIDDLEExample

SELECT t1.*
FROM lms_attendance t1
WHERE t1.id = (SELECT t2.id
                 FROM lms_attendance t2
                 WHERE t2.user = t1.user            
                 ORDER BY t2.id DESC
                 LIMIT 1)

答案 1 :(得分:66)

无需尝试重​​新发明轮子,因为这是常见的greatest-n-per-group problem。非常好solution is presented

我更喜欢没有子查询的最简单的解决方案(see SQLFiddle, updated Justin's)(因此在视图中很容易使用):

SELECT t1.*
FROM lms_attendance AS t1
LEFT OUTER JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND (t1.time < t2.time 
         OR (t1.time = t2.time AND t1.Id < t2.Id))
WHERE t2.user IS NULL

这也适用于同一组中有两个具有相同最大值的不同记录的情况 - 这要归功于(t1.time = t2.time AND t1.Id < t2.Id)的技巧。我在这里所做的就是确保如果同一用户的两个记录具有相同的时间,则只选择一个。如果标准是Id或其他标准,那么实际上并不重要 - 基本上任何保证唯一的标准都可以在这里完成工作。

答案 2 :(得分:5)

基于@TMS答案,我喜欢它,因为不需要子查询,但我认为省略'OR'部分将足以理解和阅读更简单。

SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND t1.time < t2.time
WHERE t2.user IS NULL

如果您对具有null时间的行不感兴趣,可以在WHERE子句中对它们进行过滤:

SELECT t1.*
FROM lms_attendance AS t1
LEFT JOIN lms_attendance AS t2
  ON t1.user = t2.user 
        AND t1.time < t2.time
WHERE t2.user IS NULL and t1.time IS NOT NULL

答案 3 :(得分:4)

已经解决了,但仅仅是为了记录,另一种方法是创建两个视图......

CREATE TABLE lms_attendance
(id int, user int, time int, io varchar(3));

CREATE VIEW latest_all AS
SELECT la.user, max(la.time) time
FROM lms_attendance la 
GROUP BY la.user;

CREATE VIEW latest_io AS
SELECT la.* 
FROM lms_attendance la
JOIN latest_all lall 
    ON lall.user = la.user
    AND lall.time = la.time;

INSERT INTO lms_attendance 
VALUES
(1, 9, 1370931202, 'out'),
(2, 9, 1370931664, 'out'),
(3, 6, 1370932128, 'out'),
(4, 12, 1370932128, 'out'),
(5, 12, 1370933037, 'in');

SELECT * FROM latest_io;

Click here to see it in action at SQL Fiddle

答案 4 :(得分:2)

尝试此查询:

  select id,user, max(time), io 
  FROM lms_attendance group by user;

答案 5 :(得分:0)

select b.* from 

    (select 
        `lms_attendance`.`user` AS `user`,
        max(`lms_attendance`.`time`) AS `time`
    from `lms_attendance` 
    group by 
        `lms_attendance`.`user`) a

join

    (select * 
    from `lms_attendance` ) b

on a.user = b.user
and a.time = b.time

答案 6 :(得分:0)

 select result from (
     select vorsteuerid as result, count(*) as anzahl from kreditorenrechnung where kundeid = 7148
     group by vorsteuerid
 ) a order by anzahl desc limit 0,1

答案 7 :(得分:0)

好吧,这可能是黑客入侵或容易出错,但是以某种方式它仍然可以正常工作

SELECT id, MAX(user) as user, MAX(time) as time, MAX(io) as io FROM lms_attendance GROUP BY id;

答案 8 :(得分:0)

如果您使用的是MySQL 8.0或更高版本,则可以使用Window functions

查询:

DBFiddleExample

SELECT DISTINCT
FIRST_VALUE(ID) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS ID,
FIRST_VALUE(USER) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS USER,
FIRST_VALUE(TIME) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS TIME,
FIRST_VALUE(IO) OVER (PARTITION BY lms_attendance.USER ORDER BY lms_attendance.TIME DESC) AS IO
FROM lms_attendance;

结果:

| ID | USER |       TIME |  IO |
--------------------------------
|  2 |    9 | 1370931664 | out |
|  3 |    6 | 1370932128 | out |
|  5 |   12 | 1370933037 |  in |

与使用solution proposed by Justin相比,我看到的优势在于,它使您甚至可以从子查询中选择每个用户(或每个id或每个对象)具有最新数据的行,而无需中间视图或表格。

如果您运行的是HANA,速度也会快7倍左右:D

答案 9 :(得分:0)

我做过类似下面的事情

选择t1。* 从lms_attendance t1 在(SELECT max(t2.id)作为ID的t1.id中 从lms_attendance t2 按“ t2.user”分组)

这也会降低内存利用率。

谢谢。

答案 10 :(得分:-2)

您可以按用户分组,然后按时间顺序排序。如下所示

  SELECT * FROM lms_attendance group by user order by time desc;

答案 11 :(得分:-3)

这对我有用:

SELECT user, time FROM 
(
    SELECT user, time FROM lms_attendance --where clause
) AS T 
WHERE (SELECT COUNT(0) FROM table WHERE user = T.user AND time > T.time) = 0
ORDER BY user ASC, time DESC