我有一个包含6.820.483的表,在这些行之间有很多重复项,我发现运行此查询:
SELECT player_id, match_id, team_id, count(*)
FROM fixtures
GROUP BY player_id, match_id, team_id
HAVING COUNT(*) > 1
结构示例:
player_id | match_id | team_id
19014 2506172 12573
19014 2506172 12573
19015 2506172 12573
19016 2506172 12573
19016 2506172 12573
19016 2506172 12573
如何安全地删除重复项?在上面的示例中,表格应如下所示:
player_id | match_id | team_id
19014 2506172 12573
19015 2506172 12573
19016 2506172 12573
表结构:
CREATE TABLE IF NOT EXISTS `swp`.`fixtures` (
`player_id` INT NOT NULL,
`match_id` INT NOT NULL,
`team_id` INT NOT NULL,
INDEX `player_id_idx` (`player_id` ASC),
INDEX `match_id_idx` (`match_id` ASC),
INDEX `FK_team_fixtures_id_idx` (`team_id` ASC),
CONSTRAINT `FK_player_fixtures_id`
FOREIGN KEY (`player_id`)
REFERENCES `swp`.`player` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_match_fixtures_id`
FOREIGN KEY (`match_id`)
REFERENCES `swp`.`match` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `FK_team_fixtures_id`
FOREIGN KEY (`team_id`)
REFERENCES `swp`.`team` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
答案 0 :(得分:3)
我不是MySQL期望的人,但是您可以尝试执行此操作(如果您确定在此期间不会插入新记录):
CREATE TABLE tmp_fixtures
(
player_id INT NOT NULL,
match_id INT NOT NULL,
team_id INT NOT NULL
);
SELECT DISTINCT
player_id,
match_id,
team_id
INTO tmp_fixtures
FROM fixtures;
TRUNCATE TABLE fixtures;
为了确保不再创建重复的记录,您可以执行以下操作:
ALTER TABLE fixtures ADD PRIMARY KEY (player_id, match_id, team_id);
之后,重新填充表格并清理:
INSERT INTO fixtures (player_id, match_id, team_id)
SELECT player_id,
match_id,
team_id
FROM tmp_fixtures;
DROP TABLE tmp_fixtures;
答案 1 :(得分:3)
Robert和forpas都提供了更好的答案,但是从技术上讲,我认为无需创建新表(至少在MSSQL中)就可以做到。我试图翻译成MySQL。再次,我可能永远不会这样做,尤其是在大数据集上,但这是一个有趣的练习。
与所有解决方案一样,如果您确实要尝试此操作,请首先备份表。
DECLARE @i INT = 0
WHILE @i < 6820483
BEGIN
DELETE FROM f
FROM (
SELECT *
FROM fixtures
WHERE player_id IN (SELECT player_id FROM fixtures GROUP BY player_id HAVING COUNT(*) > 1)
LIMIT 1
) f
SET @i = @i + 1
END
也正如其他答案所指出的那样,您将来可能希望创建Primary Key
来防止这种情况。
答案 2 :(得分:1)
没有其他解决方案,只能将表的不同行备份到临时表中,然后像@Robert Kock所建议的那样将其还原,但是:
重复项可以像以前一样再次出现。
因此,在还原表之前,请运行以下语句:
ALTER TABLE swp.fixtures ADD PRIMARY KEY(player_id, match_id, team_id);
添加一个多列主键,以便该问题不再出现。
Edit1
发件人:https://dev.mysql.com/doc/refman/8.0/en/ansi-diff-select-into-table.html
MySQL服务器不支持SELECT ... INTO TABLE Sybase SQL 延期。相反,MySQL Server支持INSERT INTO ... SELECT 标准SQL语法,基本上是同一回事。见章节 13.2.6.1,“插入...选择语法”。例如:
INSERT INTO tbl_temp2 (fld_id)
SELECT tbl_temp1.fld_order_id
FROM tbl_temp1 WHERE tbl_temp1.fld_order_id > 100;
Edit2 (根据Gordon Linoff的建议)
因此,整个代码应为:
CREATE TABLE tmp_fixtures AS
SELECT DISTINCT player_id, match_id, team_id FROM fixtures;
TRUNCATE TABLE fixtures;
ALTER TABLE fixtures ADD PRIMARY KEY(player_id, match_id, team_id);
INSERT INTO fixtures (player_id, match_id, team_id)
SELECT player_id, match_id, team_id FROM tmp_fixtures;
DROP TABLE tmp_fixtures;
请谨慎使用,仅当您有数据备份时。