我在Oracle数据库表中有行,对于两个字段的组合应该是唯一的,但是没有在表上设置唯一约束,所以我需要使用SQL找到所有违反约束的行。不幸的是,我的微薄的SQL技能无法胜任这项任务。
我的表有三列相关:entity_id,station_id和obs_year。对于每一行,station_id和obs_year的组合应该是唯一的,我想通过SQL查询将它们刷出来查明是否存在违反此行的行。
我尝试过以下SQL(由this previous question建议),但它对我不起作用(我对ORA-00918列进行了模糊定义):
SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
有人可以建议我做错了什么,和/或如何解决这个问题?
答案 0 :(得分:38)
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1
答案 1 :(得分:11)
SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
t1.station_id = t2.station_id
AND t1.obs_year = t2.obs_year
AND t1.RowId <> t2.RowId)
答案 2 :(得分:2)
重写您的查询
SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
我认为模糊列错误(ORA-00918)是因为您select
列的名称出现在表和子查询中,但是您没有指定是否需要来自{{1}或来自dupes
(别名为mytable
)。
答案 3 :(得分:2)
将初始选择中的3个字段更改为
SELECT
t1.entity_id, t1.station_id, t1.obs_year
答案 4 :(得分:1)
你能否创建一个包含唯一约束的新表,然后逐行复制数据,忽略失败?
答案 5 :(得分:1)
您需要为主选择中的列指定表。另外,假设entity_id是mytable的唯一键,并且与查找重复项无关,则不应在dupes子查询中对其进行分组。
尝试:
SELECT t1.entity_id, t1.station_id, t1.obs_year
FROM mytable t1
INNER JOIN (
SELECT station_id, obs_year FROM mytable
GROUP BY station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
答案 6 :(得分:0)
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1
Quassnoi的对大型表来说效率最高。 我对成本进行了分析:
SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
WHERE EXISTS (SELECT 1 from trn_refil_book b Where
a.dist_code = b.dist_code and a.book_date = b.book_date and a.book_no = b.book_no
AND a.RowId <> b.RowId)
;
费用为1322341
SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
INNER JOIN (
SELECT b.dist_code, b.book_date, b.book_no FROM trn_refil_book b
GROUP BY b.dist_code, b.book_date, b.book_no HAVING COUNT(*) > 1) c
ON
a.dist_code = c.dist_code and a.book_date = c.book_date and a.book_no = c.book_no
;
费用为1271699
而
SELECT dist_code, book_date, book_no
FROM (
SELECT t.dist_code, t.book_date, t.book_no, ROW_NUMBER() OVER (PARTITION BY t.book_date, t.book_no
ORDER BY t.dist_code) AS rn
FROM trn_refil_book t
) p
WHERE p.rn > 1
;
费用 1021984
该表未编入索引....
答案 7 :(得分:0)
SELECT entity_id, station_id, obs_year
FROM mytable
GROUP BY entity_id, station_id, obs_year
HAVING COUNT(*) > 1
指定字段以在SELECT和GROUP BY上查找重复项。
它的工作原理是使用GROUP BY
根据指定的列查找与任何其他行匹配的任何行。
HAVING COUNT(*) > 1
表示我们只对看到任何超过1次的行感兴趣(因此是重复的)
答案 8 :(得分:0)
由于我有3列主键约束并且需要查找重复项,因此我认为这里的许多解决方案既麻烦又难以理解。所以这是一个选择
SELECT id, name, value, COUNT(*) FROM db_name.table_name
GROUP BY id, name, value
HAVING COUNT(*) > 1