从重复的行中选择最大值

时间:2018-07-18 14:13:48

标签: mysql sql

表结构:

enter image description here

表中的某些记录可以由“代码”字段重复。 此外,这些记录还有一列deleteed_date,它确定删除记录的时间。 我创建一个查询:

SELECT id 
FROM analyzes 
WHERE code IN (
    SELECT code 
    FROM analyzes 
    WHERE deleted = 1 
    GROUP BY code 
    HAVING count(code)>1
)

但是它不能正常工作。 请提示,有很多sql经验的人,是否可能通过sql来发出给定的请求?

3 个答案:

答案 0 :(得分:0)

您的子查询返回的代码不足以标识外部查询中的正确行,因此您还需要标识为最大的正确Deleted日期。这是基于join的:

SELECT id 
FROM analyzes a1, (SELECT code, max(deleted_date) as dd
    FROM analyzes 
    WHERE deleted = 1 
    GROUP BY code 
    HAVING count(code)>1) a2
where a1.code = a2.code and a1.deleted_date = dd and a1.deleted = 1

答案 1 :(得分:0)

您可以将subquerynot exists结合使用:

select a.*
from analyzes a
where not exists (select 1 from analyzes a1 where a1.code = a.code and a1.deleted = 0) and
      deleted_date = (select max(a1.deleted_date) from analyzes a1 where a1.code = a.code);

答案 2 :(得分:0)

SQL Fiddle

MySQL 5.6模式设置

CREATE TABLE analyzes ( ID int, code varchar(10), deleted bit, deleted_Date date) ;

INSERT INTO analyzes (ID, code, deleted, deleted_Date)
SELECT 1, '01.00.002', 1, '2018-01-01' UNION ALL
SELECT 2, '01.00.002', 1, '2018-02-01' UNION ALL
SELECT 15, '01.00.002', 1, '2018-03-01' UNION ALL
SELECT 3, '01.00.005', 1, '2018-01-21' UNION ALL
SELECT 17, '01.00.005', 1, '2018-01-10' UNION ALL
SELECT 16, '01.00.006', 0, null UNION ALL
SELECT 18, '01.00.007', 1, '2018-01-01' UNION ALL
SELECT 19, '01.00.007', 0, null UNION ALL
SELECT 42, '01.00.007', 1, '2018-01-25'
;

主要查询

SELECT a.* 
FROM analyzes a
INNER JOIN (
  SELECT t1.code, max(t1.deleted_date) AS maxDel
  FROM analyzes t1
  LEFT OUTER JOIN analyzes t2 ON t1.code = t2.code
    AND t2.deleted = 0
  WHERE t2.id IS NULL
  GROUP BY t1.code
) s1 ON a.code = s1.code and a.Deleted_Date = s1.maxDel

Results

| ID |      code | deleted | deleted_Date |
|----|-----------|---------|--------------|
| 15 | 01.00.002 |    true |   2018-03-01 |
|  3 | 01.00.005 |    true |   2018-01-21 |

此查询首先使用子选择从数据集中获取code和最大值deleted_date。它使用LEFT JOIN ... WHERE NULL模式来消除相关的code,它们具有至少一个不是deleted的记录。对于大型数据集,这将很好地扩展。然后,外部查询INNER JOIN返回到code的内部查询和计算出的最大值deleted_date