Question

我必须为游戏服务器使用和维护旧的数据库方案...
一个非常糟糕的。每个包含可能包含非数字字符的数据的列都存储为文本我已将每列转换为正确的数据类型，但现在我遇到了设置主索引的问题它应该是id，该条目包含特定用户的唯一标识字符串。（它是一个varchar）由于之前缺少索引和由于我们的多个游戏服务器（我们有足够的，过去有更多的东西）访问相同的表而导致无法解决的无法解决的错误，我们有一些重复的行，因此无法将列设置为主要指数。

我对MySQL或SQL的总体经验很少。我不知道如何编写查询来删除重复项。

我们的一个表有两列，id和lst（varchar）。对于这个，由于更新查询中没有限制，重复项具有完全相同的行。

另一个有点复杂。它具有相同的id列，而且还有很多。但有三个问题：id，cur（int）和mdl（varchar）。这里的重复发现规则有点复杂。首先，除了特定值之外的任何mdl（例如，让它为＆＃34;默认.mdl＆＃34; ）更有可能是最新的信息。其次，具有最高cur值的那个更可能是正确的。

基于这些，我只需要在每个id的两个表中的每个（不是两个）中保留最新（最可能是正确的）行。

如何仅使用SQL？

编辑：我没有手动执行此操作的原因是每个表都有~186,000行，我估计1/20（~9,000）行是重复的。

Answer 1

最简单的方法可能是创建临时表，然后复制和移动一些数据。

要告诉你究竟要做什么有点困难，因为没有架构可供参考，但希望这会让你走上正轨。它假定您提到的第一个表的表名是my_table_1，第二个表是my_table_2，您有权创建/删除表，而您已经备份了数据库（如果您尚未备份，请立即停止）：

# First, add what will become the new id column. We'll rename it shortly.
ALTER TABLE `my_table_1`
  ADD `id_new` INT( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

ALTER TABLE `my_table_2`
  ADD `id_new` INT( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

# Next, build the structure to backup the existing values for future reference.
CREATE TABLE `temp_table_backup` (
  `id_orig` varchar( 255 ) NOT NULL,
  `id_new` int( 10 ) NULL DEFAULT NULL,
  `lst` varchar( 255 ) NULL DEFAULT NULL,
  `cur` int( 10 ) NULL DEFAULT NULL,
  `mdl` varchar ( 255 ) NULL DEFAULT NULL
);

# Now copy the old id values to the backup table
INSERT INTO temp_table_backup
  SELECT
    my_table_1.id,
    my_table_1.id_new,
    my_table_1.lst,
    my_table_2.cur,
    my_table_2.mdl
  FROM
    my_table_1
  INNER JOIN
    my_table_2
  ON
    my_table_1.id = my_table_2.id GROUP BY my_table_1.id;

# Create a table to use temporarily. I'm avoiding temporary tables because of the
# complexity of this whole thing.
CREATE TABLE `temp_table_1` (
  `id` int( 10 ) NOT NULL
);

# Copy values to the new table...
INSERT INTO temp_table_1
  SELECT
    p2.id
  FROM
    my_table_1 AS p1,
    my_table_1 AS p2
  WHERE
    p1.lst = p2.lst
  AND
    p1.id != p2.id
  GROUP BY p2.lst;

# Create another table (temporarily) for my_table_2. This one's kinda tricky,
# but "ranks" things according to different criteria.
CREATE TABLE `temp_table_2` (
  `id` int( 10 ) NOT NULL,
  `id_new` int( 10 ) NULL DEFAULT NULL,
  `rank` int( 10 ) NULL DEFAULT NULL,
  `cur` int( 10 ) NULL DEFAULT NULL,
  `mdl` varchar ( 255 ) NULL DEFAULT NULL
);

# Copy values to the new table...
INSERT INTO temp_table_2
  SELECT t1.id AS id,
  t1.id_new AS id_new,
  CASE
    WHEN t1.mdl = 'default.mdl' AND t1.cur >= t2.cur THEN 4
    WHEN t1.mdl = 'default.mdl' AND t1.cur < t2.cur THEN 3
    WHEN t1.mdl != 'default.mdl' AND t1.cur >= t2.cur THEN 2
    ELSE 1
  END AS rank,
  t1.cur AS cur,
  t1.mdl AS mdl
  FROM
    `my_table_2` AS t1,
    `my_table_2` AS t2
  WHERE t1.id != t2.id
  GROUP BY id HAVING MAX(rank)
  ORDER BY
    rank DESC,
    t1.cur DESC,
    id ASC;

# Update values in the old table using the values from temp_table_2.
UPDATE
  IGNORE `temp_table_2`,
  `my_table_2`
SET
  `my_table_2`.cur = `temp_table_2`.cur,
  `my_table_2`.mdl = `temp_table_2`.mdl
WHERE
  `my_table_2`.id_new = `temp_table_2`.id_new;

# Delete stale values...
DELETE
  FROM my_table_1
  WHERE id IN (SELECT id FROM temp_table_1);
# Again...
DELETE
  FROM my_table_2
  WHERE id IN (SELECT id FROM temp_table_1);

# Next, drop the old id columns and rename id_new to id
ALTER TABLE
  `my_table_1`
DROP `id`;

ALTER TABLE
  `my_table_1`
CHANGE
  `id_new` `id` INT( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT;

ALTER TABLE
  `my_table_2`
DROP `id`;

ALTER TABLE
  `my_table_2`
CHANGE `id_new` `id` INT( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT;

# Optional. We're done with these tables but you can drop or keep them if you want.
DROP TABLE IF EXISTS temp_table_1;
DROP TABLE IF EXISTS temp_table_2;
DROP TABLE IF EXISTS temp_table_backup;

在我的特定情况下删除重复的行？

1 个答案: