我有一个组织的SQL Server数据库,并且有许多重复的行。我想运行一个select语句来获取所有这些和dupe数量,但也返回与每个组织关联的id。
如下的陈述:
SELECT orgName, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
将返回类似
的内容orgName | dupes
ABC Corp | 7
Foo Federation | 5
Widget Company | 2
但我也想抓住他们的身份证。有没有办法做到这一点?也许就像一个
orgName | dupeCount | id
ABC Corp | 1 | 34
ABC Corp | 2 | 5
...
Widget Company | 1 | 10
Widget Company | 2 | 2
原因是还有一个单独的用户表链接到这些组织,我想统一它们(因此删除欺骗,以便用户链接到同一组织而不是欺骗组织)。但是我想手动分配,所以我不会搞砸任何东西,但是我仍然需要一个声明来返回所有dupe orgs的ID,以便我可以查看用户列表。
答案 0 :(得分:302)
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
答案 1 :(得分:88)
您可以运行以下查询并找到max(id)
的重复项并删除这些行。
SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
但是你必须运行几次这个查询。
答案 2 :(得分:31)
你可以这样做:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
如果您只想返回可以删除的记录(只留下其中一个),您可以使用:
SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1
编辑:SQL Server 2000没有ROW_NUMBER()函数。相反,您可以使用:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
答案 3 :(得分:9)
标记为正确的解决方案对我不起作用,但我发现这个答案非常有效:Get list of duplicate rows in MySql
SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id
答案 4 :(得分:8)
你可以试试这个,最适合你
WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
)
select * from CTE where RN>1
go
答案 5 :(得分:4)
select * from [Employees]
用于查找重复记录 1)使用CTE
with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte
2)使用GroupBy
select Name,EmailId,COUNT(name) as Duplicate from [Employees] group by Name,EmailId
答案 6 :(得分:4)
如果要删除重复项:
WITH CTE AS(
SELECT orgName,id,
RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
FROM organizations
)
DELETE FROM CTE WHERE RN > 1
答案 7 :(得分:3)
Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1
所以带有rowum的记录&gt; 1将是表格中的重复记录。 '由第一组按记录分区,然后通过给它们序列号序列化它们。 所以rownum&gt; 1将是可以删除的重复记录。
答案 8 :(得分:2)
select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;
答案 9 :(得分:2)
select a.orgName,b.duplicate, a.id
from organizations a
inner join (
SELECT orgName, COUNT(*) AS duplicate
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id
答案 10 :(得分:1)
您可以通过多种方式选择duplicate rows
。
对于我的解决方案,首先考虑此表为例
CREATE TABLE #Employee
(
ID INT,
FIRST_NAME NVARCHAR(100),
LAST_NAME NVARCHAR(300)
)
INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );
第一个解决方案:
SELECT DISTINCT *
FROM #Employee;
WITH #DeleteEmployee AS (
SELECT ROW_NUMBER()
OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
RNUM
FROM #Employee
)
SELECT *
FROM #DeleteEmployee
WHERE RNUM > 1
SELECT DISTINCT *
FROM #Employee
Secound解决方案:使用identity
字段
SELECT DISTINCT *
FROM #Employee;
ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)
SELECT *
FROM #Employee
WHERE UNIQ_ID < (
SELECT MAX(UNIQ_ID)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
ALTER TABLE #Employee DROP COLUMN UNIQ_ID
SELECT DISTINCT *
FROM #Employee
并且所有解决方案的结尾都使用此命令
DROP TABLE #Employee
答案 11 :(得分:1)
select orgname, count(*) as dupes, id
from organizations
where orgname in (
select orgname
from organizations
group by orgname
having (count(*) > 1)
)
group by orgname, id
答案 12 :(得分:0)
我想我知道你需要什么 我需要在答案之间混合,我想我得到了他想要的解决方案:
select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
拥有最大ID会给你一个dublicate的id和原始的id,这是他要求的:
id org name , dublicate count (missing out in this case)
id doublicate org name , doub count (missing out again because does not help in this case)
只有你把它以这种形式推出才有可悲的事情
id , name , dubid , name
希望它仍有帮助
答案 13 :(得分:0)
假设我们已经摆好桌子上的表格&#39;学生&#39;有2列:
int startFrom = 0
int limitValue = 100000
int incrementBy = 10000
startFrom.step limitValue, incrementBy, {
println it
}
student_id int
student_name varchar
现在我们想看到重复的记录 使用此查询:
Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
答案 14 :(得分:0)
我有一个更好的选择,可以在表中获取重复的记录
SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname
上述查询的结果显示了所有重复的名称,这些名称具有唯一的学生ID和重复出现的次数
答案 15 :(得分:0)
/*To get duplicate data in table */
SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1
GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
答案 16 :(得分:0)
我使用两种方法来查找重复的行。 第一种方法是最著名的一种使用并列方式。 第二种方法是使用CTE-Common Table Expression。
正如@RedFilter所提到的,这种方式也是正确的。很多次我发现CTE方法对我也有用。
WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName)
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1
在上面的示例中,我们通过使用ROW_NUMBER和PARTITION BY查找重复出现来收集结果。然后,我们应用where子句只选择重复计数大于1的行。所有结果都收集到CTE表中,并与Organizations表联接。
来源:CodoBee
答案 17 :(得分:-2)
尝试
SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;