我希望在向表中添加新字段后找到一些非常相似的记录(对于所有意图和目的,重复记录)。这是一些示例数据。
+-----+------+-----+------+-------------+----------+
| Id | Task | Sig | Form | Description | Location |
+-----+------+-----+------+-------------+----------+
| 255 | 5000 | 1 | 1 | Record 1 | (null) |
| 256 | 5000 | 1 | 1 | Record 1 | 000 |
| 257 | 5001 | 1 | 1 | Record 2 | 0T3 |
| 258 | 5001 | 1 | 2 | Record 3 | 0T3 |
| 259 | 5002 | 1 | 1 | Record 4 | 001 |
| 260 | 5003 | 1 | 1 | Record 5 | 001 |
+-----+------+-----+------+-------------+----------+
我如何设计查询以便找到重复的'记录的唯一区别是位置字段?
如果我使用这样的查询:
SELECT *
FROM MY_SAMPLE_TABLE
WHERE Task IN
(SELECT Task FROM MY_SAMPLE_TABLE
GROUP BY Task, Sig, Form, Description HAVING COUNT(*) > 1);
遗憾的是,它返回任何具有相同任务的记录。这是一张包含数万条记录的表格。
答案 0 :(得分:1)
一种简单的方法是使用窗口函数:
select t.*
from (select t.*, count(*) over (partition by task, sig, form, description) as cnt
from my_sample_table
) t
where cnt > 1;
如果您确实希望地点不同,可以使用count(distinct)
:
select t.*
from (select t.*,
count(distinct location) over (partition by task, sig, form, description) as cnt
from my_sample_table
) t
where cnt > 1;
如果您想将NULL
视为"不同的"价值,然后逻辑更复杂。
答案 1 :(得分:0)
据推测,每条记录都有一个唯一的ID。
由于您应该拥有新记录的ID,只需进行连接:
SELECT IF(new.id=all.id, "New", id) AS recordnum
, all.task
, all.sig
, all.form
, all.description
, all.location
FROM my_sample_table new
INNER JOIN my_sample_table all
ON new.task=all.task
AND new.sig=all.sig
AND new.form=all.form
AND new.description=all.description
-- AND new.id<>all.id -- optional to exclude the new record from the output
WHERE new.id=$THE_INSERTED_ID
如果您不想在插入时执行此操作,但需要追溯,
SELECT task, sig,form, description, COUNT(*), GROUP_CONCAT(id), GROUP_CONCAT(location)
FROM my_sample_table
GROUP BY task, sig,form, description
HAVING COUNT(*)>1
并将其用作子查询以获取行级记录....
SELECT r.*, all.count_dups, all.ids, all.locns
FROM my_sample_table r
INNER JOIN (
SELECT task, sig,form, description, COUNT(*) as count_ups,
GROUP_CONCAT(id) AS ids, GROUP_CONCAT(location) AS locns
FROM my_sample_table
GROUP BY task, sig,form, description
HAVING COUNT(*)>1
) all
ON r.task=all.task
AND r.sig=all.sig
AND r.description=all.description