根据每条记录的多列查找重复项

时间:2019-01-07 13:25:52

标签: sql greatest-n-per-group

我只需要从一张表中提取重复项(并且总是提取其中的一个或多个)。我们将表称为personToCostcenter。我需要根据person_id和costcenter_id两列查找重复项。

如何在SQL中做到这一点?

样本数据

Created               Editor    Person_ID Costcenter_ID
01.01.2019 00:15:15 - A424521 - X00542  - 71341
01.01.2019 00:18:29 - A424521 - X00456  - 71341
01.01.2019 00:19:05 - A424521 - X00410  - 71341
01.01.2019 00:19:07 - A424521 - X01544  - 71341
01.01.2019 00:19:07 - A424521 - X00455  - 71341
01.01.2019 00:20:47 - A424521 - X00879  - 71341
01.01.2019 00:20:58 - A424521 - X00214  - 71341
01.01.2019 00:21:18 - A424521 - X00458  - 71341
01.01.2019 00:23:57 - A424521 - X00542  - 71341
01.01.2019 00:23:59 - A424521 - X00122  - 71341
01.01.2019 00:24:07 - A424521 - X00542  - 71341

这里想要的结果是

01.01.2019 00:23:57 - A424521 - X00542  - 71341
01.01.2019 00:24:07 - A424521 - X00542  - 71341

2 个答案:

答案 0 :(得分:0)

如果created是唯一的,则可以使用EXISTS来筛选过去存在时间的重复行的行。

SELECT *
       FROM personstocostcenter t1
       WHERE EXISTS (SELECT *
                            FROM personstocostcenter t2
                            WHERE t2.person_id = t1.person_id
                                  AND t2.costcenter_id = t1.costcenter_id
                                  AND t2.created < t1.created);

答案 1 :(得分:0)

对于每个重复项,至少存在一行具有相同数据的行:

select * from personsToCostcenter p
where exists (
  select 1 from personsToCostcenter pp
  where
  p.Person_ID = pp.Person_ID
  and
  p.Costcenter_ID = pp.Costcenter_ID
  and
  p.Created > pp.Created
)