postgresql:查找包含不区分大小写的字符串重复的行的ID

时间:2010-10-14 23:33:14

标签: sql postgresql case-insensitive deduplication

我想选择然后删除表格中具有不区分大小写重复项的条目列表。

换句话说,这些行是唯一的......但如果您忽略大小写因素,它们就不是唯一的。他们在我不看的时候进来了。

那么如何选择列来查找我应该删除的ID呢? (我可以删除两个复制品。)

简单的样本列结构:

player_id | uname
------------------
34        | BOB
544       | bob
etc...

1 个答案:

答案 0 :(得分:2)

要保留的球员(假设他们先注册)

SELECT min(player_id) as player_id
FROM players
GROUP BY lower(uname)

用它来显示要删除的用户及其相应的管理员。

SELECT 
    players.player_id delete_id,
    players.uname delete_uname,
    keepers.uname keeper_uname,
    keepers.player_id keeper_id    
FROM players JOIN 
    (
        SELECT p.player_id, p.uname
        FROM players p JOIN
        (
            SELECT min(player_id) player_id
              FROM players
          GROUP BY lower(uname)
        ) as keeper_ids
        ON (p.player_id = keeper_ids.player_id)     
    ) as keepers
    ON (lower(players.uname) = lower(keepers.uname) AND players.player_id <> keepers.player_id)
ORDER BY keepers.player_id, players.player_id 

输出:

delete_id | delete_uname | keeper_uname | keeper_id
---------------------------------------------------
544       | bob          | BOB          | 34