我有一个InnoDB表,其中启用了主键和辅助唯一键。我使用load data infile转储超级大csv文件(200m记录)。然后我发现表中有重复的记录。这对我没有任何意义。我想知道这会发生什么?我在session和global上检查了unique_checks是否为“ON”。
我使用的加载数据infile查询:
load data infile "/tmp/test" replace into table temp.test fields terminated by ';' lines terminated by '\t\n' (first_name, last_name, birth_date, doc_number);
表架构是:
create table test(
id int(10) not null auto_increment,
first_name varchar(30) not null default '',
last_name varchar(30) not null default '',
birth_date datetime null default null,
doc_number int(10) not null default '',
primary key (id, first_name),
unique key (first_name, last_name, birth_date, doc_number),
partition by range(id)
PARTITION p0 VALUES LESS THAN (100,000,000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (400,000,000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (700,000,000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (maxvalue) ENGINE = InnoDB
)
我找到的重复记录:
select * from temp.test where first_name = 'John' and last_name = 'Doe';
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ id ++ first_name ++ last_name ++ birth_date ++ doc_number +
+ 3 ++ John ++ Doe ++ 1967-05-04 00:00:00 ++ 1843 +
+ 97 ++ John ++ Doe ++ 1967-05-04 00:00:00 ++ 1843 +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
我尝试使用较小的数据集重现此问题,但它无法正常工作。所以我现在尝试在原始数据集上重现它。但它对我来说没有意义,因为我在桌子上有独特的钥匙。因此,进一步说明任何关于在哪里观察的建议或方向都会非常有帮助。谢谢!