我正在尝试查找重复项,以便将其删除。
我有一个名为categories
的表,其中包含uid,qid和value。
uid
是该表的唯一ID
qid
是一个问题ID
value
是该qid的标签
这样,每个qid可以有很多行,但是每个qid应该具有唯一的值。
例如:
mysql> SELECT * FROM categories WHERE qid=6869;
+-------+------+-----------+
| uid | qid | value |
+-------+------+-----------+
| 19838 | 6869 | Sport |
| 19839 | 6869 | Football |
| 19840 | 6869 | Sport |
| 19841 | 6869 | Athletics |
+-------+------+-----------+
如您所见,它有两个Sport
。我们那里有超过8 000个qid,每个qid具有3-8个标签...我真的不想手动检查每个qid。
因此,至少,我很乐意获得与此问题有关的qid
列表,并最多删除所有重复项。
我尝试过的事情:
SELECT count(value) AS cnt FROM categories GROUP BY value HAVING cnt>1;
这给了我一张桌子,上面有很多数字,但是由于出现了这个错误,我无法让它打印出更多的数字:
mysql> SELECT *, count(value) AS cnt FROM categories GROUP BY value HAVING cnt>1;
ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'quizmastershop.categories.uid' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
而且,这不是我所需要的,因为它只给我每个值的计数...
有什么想法吗?
欢呼
编辑:版本数据
mysql> SELECT VERSION();
+-------------------------+
| VERSION() |
+-------------------------+
| 5.7.21-0ubuntu0.16.04.1 |
+-------------------------+
编辑2:我从sql_mode字符串中删除了ONLY_FULL_GROUP_BY
。在上面产生错误的字符串仍然无法为我提供任何有用的
编辑3:尝试Erics代码,这正是我需要的输出:-)
+-------+------+-------------------+
| uid | qid | value |
+-------+------+-------------------+
| 470 | 170 | Children's |
| 472 | 170 | Children's |
| 570 | 204 | Geography |
| 572 | 204 | Geography |
| 575 | 205 | Geography |
| 577 | 205 | Geography |
答案 0 :(得分:2)
Select * from categories where value in (SELECT value FROM categories GROUP BY value HAVING count(value)>1)
答案 1 :(得分:1)
尝试下面的代码。基本上,内部查询抓取记录具有多个条目。外部查询将其重新连接到类别表以获取uid。
SELECT DISTINCT c.uid, c.qid, c.value
FROM categories c
JOIN (
SELECT qid, value, COUNT(*)
FROM categories
GROUP BY qid, value
HAVING COUNT(*) > 1
) a ON a.qid = c.qid AND a.value = c.value