在SQL中选择类似的记录

时间:2016-01-29 16:32:01

标签: sql oracle

我希望在向表中添加新字段后找到一些非常相似的记录(对于所有意图和目的,重复记录)。这是一些示例数据。

+-----+------+-----+------+-------------+----------+
| Id  | Task | Sig | Form | Description | Location |
+-----+------+-----+------+-------------+----------+
| 255 | 5000 |   1 |    1 | Record 1    | (null)   |
| 256 | 5000 |   1 |    1 | Record 1    | 000      |
| 257 | 5001 |   1 |    1 | Record 2    | 0T3      |
| 258 | 5001 |   1 |    2 | Record 3    | 0T3      |
| 259 | 5002 |   1 |    1 | Record 4    | 001      |
| 260 | 5003 |   1 |    1 | Record 5    | 001      |
+-----+------+-----+------+-------------+----------+

我如何设计查询以便找到重复的'记录的唯一区别是位置字段?

如果我使用这样的查询:

SELECT *
FROM MY_SAMPLE_TABLE
WHERE Task IN
  (SELECT Task FROM MY_SAMPLE_TABLE
  GROUP BY Task, Sig, Form, Description HAVING COUNT(*) > 1);

遗憾的是,它返回任何具有相同任务的记录。这是一张包含数万条记录的表格。

2 个答案:

答案 0 :(得分:1)

一种简单的方法是使用窗口函数:

select t.*
from (select t.*, count(*) over (partition by task, sig, form, description) as cnt
      from my_sample_table
     ) t
where cnt > 1;

如果您确实希望地点不同,可以使用count(distinct)

select t.*
from (select t.*,
             count(distinct location) over (partition by task, sig, form, description) as cnt
      from my_sample_table
     ) t
where cnt > 1;

如果您想将NULL视为"不同的"价值,然后逻辑更复杂。

答案 1 :(得分:0)

据推测,每条记录都有一个唯一的ID。

由于您应该拥有新记录的ID,只需进行连接:

SELECT IF(new.id=all.id, "New", id) AS recordnum
, all.task
, all.sig
, all.form
, all.description
, all.location
FROM my_sample_table new
INNER JOIN my_sample_table all
ON new.task=all.task
AND new.sig=all.sig
AND new.form=all.form
AND new.description=all.description
-- AND new.id<>all.id -- optional to exclude the new record from the output
WHERE new.id=$THE_INSERTED_ID

如果您不想在插入时执行此操作,但需要追溯,

SELECT task, sig,form, description, COUNT(*), GROUP_CONCAT(id), GROUP_CONCAT(location)
FROM my_sample_table
GROUP BY task, sig,form, description
HAVING COUNT(*)>1

并将其用作子查询以获取行级记录....

SELECT r.*, all.count_dups, all.ids, all.locns
FROM my_sample_table r
INNER JOIN (
   SELECT task, sig,form, description, COUNT(*) as count_ups,
      GROUP_CONCAT(id) AS ids, GROUP_CONCAT(location) AS locns
   FROM my_sample_table
   GROUP BY task, sig,form, description
   HAVING COUNT(*)>1
) all
ON r.task=all.task
AND r.sig=all.sig
AND r.description=all.description