在mySQL中我想创建一个脚本,它将生成告诉我具有重复项的表的列表。
如果我必须找到哪些表有重复项,我将转到每个表并运行脚本
select count(*) from TableA
那么做
select select distinct count(*) from TableA
如果它相同,那么该表没有重复,否则它有重复。 我甚至可以从Information_schema获取表名列表。
select * from information_schema.tables where table_type = 'base table'
我认为这可能需要存储过程。 我试过了:
DELIMITER //
CREATE PROCEDURE duplicates
BEGIN
set @i = (SELECT COUNT(*) FROM tableA);
set @j = (select distinct count(*) from tableA);
if (@i = @j)
then
select 1;
else
select 0;
end if
END//
DELIMITER ;
你可以帮我解决这个问题。 完全不同的方法也很好。
答案 0 :(得分:0)
我怀疑你的选择不会发现重复,但是如果你对他们感到高兴,你可以构建代码以便从information_schema.tables提交到sql,(你可能会发现你的表使用保留字)。在下面的示例中,我使用游标迭代表并将结果写入debug_table。
drop procedure if exists tablecounts;
delimiter $$
CREATE procedure `tablecounts`()
begin
declare i int;
declare j int;
declare vtable varchar(100);
declare done int default 0;
declare cur cursor for select table_name from information_schema.tables where table_schema = 'sandbox' and table_type = 'base table' and table_name <> 'check';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
open cur;
truncate table debug_table;
cursorloop:loop
fetch cur into vtable;
if done = true then
leave cursorloop;
end if;
set @sql = (concat
(
'insert into debug_table(msg,msg2) '
'select ', char(39),vtable, char(39), ', case when cnt1 <> cnt2 then ', char(39),'duplicates exist',char(39),' else null end msg',
' from (select (select count(*) from ', vtable, ') as cnt1, (select distinct(count(*)) from ', vtable, ') cnt2) s;'
)
);
#select @sql;
prepare sqlstmt from @sql;
execute sqlstmt;
deallocate prepare sqlstmt;
end loop cursorloop;
close cur;
end $$
delimiter ;
call tablecounts();
Debug_table架构。
CREATE TABLE debug_table
(
id
int(11)NOT NULL AUTO_INCREMENT,
msg
varchar(500)DEFAULT NULL,
MSG2
varchar(500)DEFAULT NULL,
主要关键(id
)
)
答案 1 :(得分:0)
要考虑的几点而不是解决方案。
首先,您永远不会在带有PRIMARY KEY的 ANY 表中找到重复的行,因为根据定义,PK是唯一的。
其次,
select distinct count(*) from TableA;
不起作用,因为它会为您提供来自COUNT()的 DISTINCT结果,而不是DISTINCT结果的 COUNT()。您将获得与直接计数(*)相同的结果。您需要先获取不同的行,然后计算它们。
在这个例子中,我有1,000,001行的表。我故意在一组独特的记录中添加了一个副本,并删除了主键(否则就没有重复记录)。
-- REMOVE PK to set up test
mysql> alter table sbtest1 drop column id;
Query OK, 1000001 rows affected
Records: 1000001 Duplicates: 0 Warnings: 0
-- straight COUNT(*) of rows
mysql> select count(*) FROM onemillion.sbtest1;
+----------+
| count(*) |
+----------+
| 1000001 |
+----------+
-- WRONG ANSWER
mysql> select distinct count(*) FROM onemillion.sbtest1;
+----------+
| count(*) |
+----------+
| 1000001 |
+----------+
-- CORRECT ANSWER
mysql> select count(*) FROM (select distinct * from onemillion.sbtest1) a;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (52.39 sec)
第三点,如果你查看最终查询所花费的时间,你会发现计数行不是一个快速操作。
第四,如果您决定排除考虑的PK列并在其他列的基础上进行检查,您将如何处理具有UNIQUE约束的列,因为这些将允许NULL ?
鉴于此表
CREATE TABLE `table1` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
UNIQUE KEY `b` (`b`)
);
当b
中的值是&#39;唯一的&#39;?
+------+------+
| a | b |
+------+------+
| 1 | NULL |
| 1 | NULL |
+------+------+
2 rows in set (0.00 sec)
MySQL认为他们是
mysql> select count(*) from table1;
+----------+
| count(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from (select distinct * from table1) a;
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
这是一个测试单个表的解决方案,它可以由另一个获取表名的过程调用。
DELIMITER //
DROP PROCEDURE IF EXISTS dupes //
CREATE PROCEDURE dupes (IN sname VARCHAR(64), IN tname VARCHAR(64))
BEGIN
DECLARE cols TEXT;
SET @rcount := 0;
SET @dcount := 0;
-- Get all the non PK columns in target table
SELECT GROUP_CONCAT(`column_name`) INTO cols
FROM `information_schema`.`columns`
WHERE `table_schema` = sname
AND `table_name` = tname
AND `column_key` != 'PRI'
ORDER BY `ordinal_position` ASC;
SET @rsql = CONCAT('SELECT COUNT(*) INTO @rcount FROM `', sname, '`.`', tname, '`');
PREPARE stmt1 FROM @rsql;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
SET @dsql = CONCAT('SELECT COUNT(*) INTO @dcount ',
'FROM (SELECT DISTINCT ', cols , ' ',
'FROM `', sname, '`.`', tname, '`) der');
PREPARE stmt2 FROM @dsql;
EXECUTE stmt2;
DEALLOCATE PREPARE stmt2;
SELECT CONCAT(@rcount, ' rows: ', @rcount - @dcount,
' duplicate(s) found in `', sname, '`.`', tname, '`' ) AS 'Check duplicate rows';
END //
DELIMITER ;
返回
mysql> call test.dupes('onemillion','sbtest1');
+--------------------------------------------------------------+
| Check duplicate rows |
+--------------------------------------------------------------+
| 1000001 rows: 1 duplicate(s) found in `onemillion`.`sbtest1` |
+--------------------------------------------------------------+
1 row in set (12.88 sec)