创建脚本以查找哪些表在数据库中具有重复记录?

时间:2018-03-28 00:38:41

标签: mysql sql stored-procedures

在mySQL中我想创建一个脚本,它将生成告诉我具有重复项的表的列表。 如果我必须找到哪些表有重复项,我将转到每个表并运行脚本     select count(*) from TableA
那么做  select select distinct count(*) from TableA

如果它相同,那么该表没有重复,否则它有重复。 我甚至可以从Information_schema获取表名列表。

select * from information_schema.tables  where table_type = 'base table'

我认为这可能需要存储过程。 我试过了:

        DELIMITER //
 CREATE PROCEDURE duplicates
   BEGIN
set @i = (SELECT  COUNT(*) FROM tableA);
set @j = (select distinct count(*) from tableA);
  if (@i = @j)
    then
    select 1;
      else 
      select 0;
end if
END//
 DELIMITER ;

你可以帮我解决这个问题。 完全不同的方法也很好。

2 个答案:

答案 0 :(得分:0)

我怀疑你的选择不会发现重复,但是如果你对他们感到高兴,你可以构建代码以便从information_schema.tables提交到sql,(你可能会发现你的表使用保留字)。在下面的示例中,我使用游标迭代表并将结果写入debug_table。

drop procedure if exists tablecounts;
delimiter $$
CREATE  procedure `tablecounts`()

begin

declare  i int;
declare   j int;
declare   vtable varchar(100);
declare done int default 0;
declare cur cursor for select table_name from information_schema.tables where table_schema = 'sandbox'  and table_type = 'base table' and table_name <> 'check';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
open cur;
truncate table debug_table;
cursorloop:loop
    fetch cur into vtable;
    if done = true then  
        leave cursorloop;
    end if;  
    set @sql = (concat
     (
     'insert into debug_table(msg,msg2) '
     'select ', char(39),vtable, char(39), ', case when cnt1 <> cnt2 then ', char(39),'duplicates exist',char(39),' else null end msg',
     ' from (select (select count(*) from ', vtable, ') as cnt1, (select distinct(count(*)) from ', vtable, ') cnt2) s;'
     )
     );
     #select @sql;

    prepare sqlstmt from @sql;
    execute sqlstmt;
    deallocate prepare sqlstmt;

end loop cursorloop;
close cur;


end $$

delimiter ;

call tablecounts();

Debug_table架构。     CREATE TABLE debug_table(       id int(11)NOT NULL AUTO_INCREMENT,       msg varchar(500)DEFAULT NULL,       MSG2 varchar(500)DEFAULT NULL,       主要关键(id)     )

答案 1 :(得分:0)

要考虑的几点而不是解决方案。

首先,您永远不会在带有PRIMARY KEY的 ANY 表中找到重复的行,因为根据定义,PK是唯一的。

其次,

select distinct count(*) from TableA;

不起作用,因为它会为您提供来自COUNT()的 DISTINCT结果,而不是DISTINCT结果的 COUNT()。您将获得与直接计数(*)相同的结果。您需要先获取不同的行,然后计算它们。

在这个例子中,我有1,000,001行的表。我故意在一组独特的记录中添加了一个副本,并删除了主键(否则就没有重复记录)。

-- REMOVE PK to set up test
mysql> alter table sbtest1 drop column id;
Query OK, 1000001 rows affected
Records: 1000001  Duplicates: 0  Warnings: 0

-- straight COUNT(*) of rows
mysql> select count(*) FROM onemillion.sbtest1;
+----------+
| count(*) |
+----------+
|  1000001 |
+----------+

-- WRONG ANSWER
mysql> select distinct count(*) FROM onemillion.sbtest1;
+----------+
| count(*) |
+----------+
|  1000001 |
+----------+

-- CORRECT ANSWER
mysql> select count(*) FROM (select distinct * from onemillion.sbtest1) a;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (52.39 sec)

第三点,如果你查看最终查询所花费的时间,你会发现计数行不是一个快速操作。

第四,如果您决定排除考虑的PK列并在其他列的基础上进行检查,您将如何处理具有UNIQUE约束的列,因为这些将允许NULL ?

鉴于此表

CREATE TABLE `table1` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  UNIQUE KEY `b` (`b`)
);

b中的值是&#39;唯一的&#39;?

时,这些行是否等效
+------+------+
| a    | b    |
+------+------+
|    1 | NULL |
|    1 | NULL |
+------+------+
2 rows in set (0.00 sec)

MySQL认为他们是

mysql> select count(*) from table1;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from (select distinct * from table1) a;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

更新

这是一个测试单个表的解决方案,它可以由另一个获取表名的过程调用。

DELIMITER //

DROP PROCEDURE IF EXISTS dupes //
CREATE PROCEDURE dupes (IN sname VARCHAR(64), IN tname VARCHAR(64))
BEGIN
  DECLARE cols TEXT;
  SET @rcount := 0;
  SET @dcount := 0;

-- Get all the non PK columns in target table
  SELECT GROUP_CONCAT(`column_name`) INTO cols
  FROM `information_schema`.`columns`
  WHERE `table_schema` = sname
  AND `table_name` = tname
  AND `column_key` != 'PRI'
  ORDER BY `ordinal_position` ASC;

  SET @rsql = CONCAT('SELECT COUNT(*) INTO @rcount FROM `', sname, '`.`', tname, '`');

  PREPARE stmt1 FROM @rsql;
  EXECUTE stmt1;
  DEALLOCATE PREPARE stmt1;

  SET @dsql = CONCAT('SELECT COUNT(*) INTO @dcount ', 
                     'FROM (SELECT DISTINCT ', cols , ' ', 
                           'FROM `', sname, '`.`', tname, '`) der');
  PREPARE stmt2 FROM @dsql;
  EXECUTE stmt2;
  DEALLOCATE PREPARE stmt2;

  SELECT CONCAT(@rcount, ' rows: ', @rcount - @dcount, 
                ' duplicate(s) found in `', sname, '`.`', tname, '`' ) AS 'Check duplicate rows';

END //

DELIMITER ;

返回

mysql> call test.dupes('onemillion','sbtest1');
+--------------------------------------------------------------+
| Check duplicate rows                                         |
+--------------------------------------------------------------+
| 1000001 rows: 1 duplicate(s) found in `onemillion`.`sbtest1` |
+--------------------------------------------------------------+
1 row in set (12.88 sec)