Question

在mySQL中我想创建一个脚本，它将生成告诉我具有重复项的表的列表。如果我必须找到哪些表有重复项，我将转到每个表并运行脚本 select count(*) from TableA
那么做 select select distinct count(*) from TableA

如果它相同，那么该表没有重复，否则它有重复。我甚至可以从Information_schema获取表名列表。

select * from information_schema.tables  where table_type = 'base table'

我认为这可能需要存储过程。我试过了：

        DELIMITER //
 CREATE PROCEDURE duplicates
   BEGIN
set @i = (SELECT  COUNT(*) FROM tableA);
set @j = (select distinct count(*) from tableA);
  if (@i = @j)
    then
    select 1;
      else 
      select 0;
end if
END//
 DELIMITER ;

你可以帮我解决这个问题。完全不同的方法也很好。

Answer 1

我怀疑你的选择不会发现重复，但是如果你对他们感到高兴，你可以构建代码以便从information_schema.tables提交到sql，（你可能会发现你的表使用保留字）。在下面的示例中，我使用游标迭代表并将结果写入debug_table。

drop procedure if exists tablecounts;
delimiter $$
CREATE  procedure `tablecounts`()

begin

declare  i int;
declare   j int;
declare   vtable varchar(100);
declare done int default 0;
declare cur cursor for select table_name from information_schema.tables where table_schema = 'sandbox'  and table_type = 'base table' and table_name <> 'check';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
open cur;
truncate table debug_table;
cursorloop:loop
    fetch cur into vtable;
    if done = true then  
        leave cursorloop;
    end if;  
    set @sql = (concat
     (
     'insert into debug_table(msg,msg2) '
     'select ', char(39),vtable, char(39), ', case when cnt1 <> cnt2 then ', char(39),'duplicates exist',char(39),' else null end msg',
     ' from (select (select count(*) from ', vtable, ') as cnt1, (select distinct(count(*)) from ', vtable, ') cnt2) s;'
     )
     );
     #select @sql;

    prepare sqlstmt from @sql;
    execute sqlstmt;
    deallocate prepare sqlstmt;

end loop cursorloop;
close cur;


end $$

delimiter ;

call tablecounts();

Debug_table架构。 CREATE TABLE debug_table（ id int（11）NOT NULL AUTO_INCREMENT， msg varchar（500）DEFAULT NULL， MSG2 varchar（500）DEFAULT NULL，主要关键（id））

Answer 2

要考虑的几点而不是解决方案。

首先，您永远不会在带有PRIMARY KEY的 ANY 表中找到重复的行，因为根据定义，PK是唯一的。

其次，

select distinct count(*) from TableA;

不起作用，因为它会为您提供来自COUNT（）的 DISTINCT结果，而不是DISTINCT结果的 COUNT（）。您将获得与直接计数（*）相同的结果。您需要先获取不同的行，然后计算它们。

在这个例子中，我有1,000,001行的表。我故意在一组独特的记录中添加了一个副本，并删除了主键（否则就没有重复记录）。

-- REMOVE PK to set up test mysql> alter table sbtest1 drop column id; Query OK, 1000001 rows affected Records: 1000001 Duplicates: 0 Warnings: 0 -- straight COUNT(*) of rows mysql> select count(*) FROM onemillion.sbtest1; +----------+ | count(*) | +----------+ | 1000001 | +----------+ -- WRONG ANSWER mysql> select distinct count(*) FROM onemillion.sbtest1; +----------+ | count(*) | +----------+ | 1000001 | +----------+ -- CORRECT ANSWER mysql> select count(*) FROM (select distinct * from onemillion.sbtest1) a; +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row in set (52.39 sec)

第三点，如果你查看最终查询所花费的时间，你会发现计数行不是一个快速操作。

第四，如果您决定排除考虑的PK列并在其他列的基础上进行检查，您将如何处理具有UNIQUE约束的列，因为这些将允许NULL ？

鉴于此表

CREATE TABLE `table1` ( `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, UNIQUE KEY `b` (`b`) );

当b中的值是＆＃39;唯一的＆＃39;？
时，这些行是否等效
+------+------+ | a | b | +------+------+ | 1 | NULL | | 1 | NULL | +------+------+ 2 rows in set (0.00 sec)

MySQL认为他们是

mysql> select count(*) from table1; +----------+ | count(*) | +----------+ | 2 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from (select distinct * from table1) a; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec)

更新

这是一个测试单个表的解决方案，它可以由另一个获取表名的过程调用。

DELIMITER // DROP PROCEDURE IF EXISTS dupes // CREATE PROCEDURE dupes (IN sname VARCHAR(64), IN tname VARCHAR(64)) BEGIN DECLARE cols TEXT; SET @rcount := 0; SET @dcount := 0; -- Get all the non PK columns in target table SELECT GROUP_CONCAT(`column_name`) INTO cols FROM `information_schema`.`columns` WHERE `table_schema` = sname AND `table_name` = tname AND `column_key` != 'PRI' ORDER BY `ordinal_position` ASC; SET @rsql = CONCAT('SELECT COUNT(*) INTO @rcount FROM `', sname, '`.`', tname, '`'); PREPARE stmt1 FROM @rsql; EXECUTE stmt1; DEALLOCATE PREPARE stmt1; SET @dsql = CONCAT('SELECT COUNT(*) INTO @dcount ', 'FROM (SELECT DISTINCT ', cols , ' ', 'FROM `', sname, '`.`', tname, '`) der'); PREPARE stmt2 FROM @dsql; EXECUTE stmt2; DEALLOCATE PREPARE stmt2; SELECT CONCAT(@rcount, ' rows: ', @rcount - @dcount, ' duplicate(s) found in `', sname, '`.`', tname, '`' ) AS 'Check duplicate rows'; END // DELIMITER ;

返回

mysql> call test.dupes('onemillion','sbtest1'); +--------------------------------------------------------------+ | Check duplicate rows | +--------------------------------------------------------------+ | 1000001 rows: 1 duplicate(s) found in `onemillion`.`sbtest1` | +--------------------------------------------------------------+ 1 row in set (12.88 sec)

创建脚本以查找哪些表在数据库中具有重复记录？

2 个答案:

更新