我试图在用户表上找到所有案例变体重复项:
SELECT LOWER(EMAIL), COUNT(EMAIL)
FROM USERS
GROUP BY LOWER(EMAIL)
HAVING COUNT (LOWER(EMAIL)) >= 3;
结果类似于:
Emails Count (number of duplicates)
bob@example.com 3
john.smith@example.com 3
blah@example.com 4
james.smith@example.com 3
问题是我需要每封电子邮件的ID,我该如何实现?由于GROUP BY:
,我无法简单地将它添加到SELECT语句中SELECT **ID**, LOWER(EMAIL), COUNT(EMAIL)
FROM USERS
GROUP BY **ID**, LOWER(EMAIL)
HAVING COUNT (LOWER(EMAIL)) >= 3;
以上内容会查找重复的电子邮件和 ID,这不是我需要的。
答案 0 :(得分:4)
SELECT ID, EMAIL, LOWER(EMAIL), HOW_MANY
FROM (
SELECT ID, EMAIL, COUNT(*) OVER (PARTITION BY LOWER(EMAIL)) AS HOW_MANY
FROM USERS
)
WHERE HOW_MANY >= 3
ORDER BY ID;
ID EMAIL LOWER(EMAIL) HOW_MANY
---------- ------------------------------ ------------------------------ ----------
1 bob@example.com bob@example.com 3
2 Bob@example.com bob@example.com 3
3 BOB@example.com bob@example.com 3
4 john.smith@example.com john.smith@example.com 3
5 John.smith@example.com john.smith@example.com 3
6 JOHN.smith@example.com john.smith@example.com 3
7 blah@example.com blah@example.com 4
8 BLAH@example.com blah@example.com 4
9 blAH@example.com blah@example.com 4
10 BLah@example.com blah@example.com 4
11 james.smith@example.com james.smith@example.com 3
12 James.smith@example.com james.smith@example.com 3
13 JAMES.smith@example.com james.smith@example.com 3
SQL Fiddle。关于分析的一个好处是,这只需要一次就能达到目标。
答案 1 :(得分:1)
试试这个:
SELECT U.*, COUNT(LOWER(EMAIL)) OVER (PARTITION BY (LOWER(EMAIL)))
FROM USERS U WHERE LOWER(EMAIL) IN (SELECT LOWER(EMAIL)
FROM USERS
GROUP BY LOWER(EMAIL)
HAVING COUNT (LOWER(EMAIL)) >= 3);