SQL:如何根据两个字段查找重复项?

时间:2010-08-17 15:16:02

标签: sql oracle unique unique-constraint ora-00918

我在Oracle数据库表中有行,对于两个字段的组合应该是唯一的,但是没有在表上设置唯一约束,所以我需要使用SQL找到所有违反约束的行。不幸的是,我的微薄的SQL技能无法胜任这项任务。

我的表有三列相关:entity_id,station_id和obs_year。对于每一行,station_id和obs_year的组合应该是唯一的,我想通过SQL查询将它们刷出来查明是否存在违反此行的行。

我尝试过以下SQL(由this previous question建议),但它对我不起作用(我对ORA-00918列进行了模糊定义):

SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

有人可以建议我做错了什么,和/或如何解决这个问题?

9 个答案:

答案 0 :(得分:38)

SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
        FROM    mytable t
        )
WHERE   rn > 1

答案 1 :(得分:11)

SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
       t1.station_id = t2.station_id
       AND t1.obs_year = t2.obs_year
       AND t1.RowId <> t2.RowId)

答案 2 :(得分:2)

重写您的查询

SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

我认为模糊列错误(ORA-00918)是因为您select列的名称出现在表和子查询中,但是您没有指定是否需要来自{{1}或来自dupes(别名为mytable)。

答案 3 :(得分:2)

将初始选择中的3个字段更改为

SELECT
t1.entity_id, t1.station_id, t1.obs_year

答案 4 :(得分:1)

你能否创建一个包含唯一约束的新表,然后逐行复制数据,忽略失败?

答案 5 :(得分:1)

您需要为主选择中的列指定表。另外,假设entity_id是mytable的唯一键,并且与查找重复项无关,则不应在dupes子查询中对其进行分组。

尝试:

SELECT t1.entity_id, t1.station_id, t1.obs_year
FROM mytable t1
INNER JOIN (
SELECT station_id, obs_year FROM mytable 
GROUP BY station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

答案 6 :(得分:0)

SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
        FROM    mytable t
        )
WHERE   rn > 1
Quassnoi的

对大型表来说效率最高。 我对成本进行了分析:

SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
WHERE EXISTS (SELECT 1 from trn_refil_book b Where
       a.dist_code = b.dist_code and a.book_date = b.book_date and a.book_no = b.book_no
       AND a.RowId <> b.RowId)
       ;

费用为1322341

SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
INNER JOIN (
SELECT b.dist_code, b.book_date, b.book_no FROM trn_refil_book b 
GROUP BY b.dist_code, b.book_date, b.book_no HAVING COUNT(*) > 1) c 
ON 
 a.dist_code = c.dist_code and a.book_date = c.book_date and a.book_no = c.book_no
;

费用为1271699

SELECT  dist_code, book_date, book_no
FROM    (
        SELECT  t.dist_code, t.book_date, t.book_no, ROW_NUMBER() OVER (PARTITION BY t.book_date, t.book_no
          ORDER BY t.dist_code) AS rn
        FROM    trn_refil_book t
        ) p
WHERE   p.rn > 1
;

费用 1021984

该表未编入索引....

答案 7 :(得分:0)

  SELECT entity_id, station_id, obs_year
    FROM mytable
GROUP BY entity_id, station_id, obs_year
HAVING COUNT(*) > 1

指定字段以在SELECT和GROUP BY上查找重复项。

它的工作原理是使用GROUP BY根据指定的列查找与任何其他行匹配的任何行。 HAVING COUNT(*) > 1表示我们只对看到任何超过1次的行感兴趣(因此是重复的)

答案 8 :(得分:0)

由于我有3列主键约束并且需要查找重复项,因此我认为这里的许多解决方案既麻烦又难以理解。所以这是一个选择

SELECT id, name, value, COUNT(*) FROM db_name.table_name
GROUP BY id, name, value
HAVING COUNT(*) > 1