我正在验证一张包含电子商务网站交易级别数据的表格,并找到确切的错误。
我希望您的帮助能够在SQL Server上的50列表中找到重复记录。
假设我的数据是:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
假设我使用以下查询:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
会回复我
Sam 2
John 2
但我不想发现重复超过1或2列。我想在我的数据中找到所有列的副本。我希望结果为:
1 Sam 10 A Iphone
答案 0 :(得分:62)
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
您可以通过
替换select语句来删除重复项delete x where rn > 1
答案 1 :(得分:40)
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
答案 2 :(得分:4)
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
答案 3 :(得分:3)
只需将所有字段添加到查询中,并记住也将它们添加到Group By中。
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
答案 4 :(得分:3)
要获取多个记录的列表,请使用以下命令
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
答案 5 :(得分:1)
试试这个
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
首先阅读CHECKSUM函数,因为可能存在重复项。
答案 6 :(得分:0)
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
答案 7 :(得分:0)
首先,我怀疑结果不准确?好像原来的桌子里有三个'山姆'。但这个问题并不重要。
然后我们来问问自己。根据您的表格,显示重复值的最佳方法是使用count(*)
和Group by
子句。查询看起来像这样
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
原因是您表中的所有列都唯一地标识了每条记录,这意味着只有当每列中的所有值完全相同时,记录才会被视为重复记录,您还希望显示重复记录的所有字段,所以group by
不会错过任何列,否则是,因为您只能select
列参与'group by'子句。
现在我想给你With...Row_Number()Over(...)
的任何一个例子,它使用表表达式和Row_Number函数。
假设您有一个几乎相同的表但有一个名为发货日期的额外列,并且该值可能会更改,即使其余列都相同。这是:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
请注意,如果您仍将所有列作为一个单元,则第2行不是重复行。但是如果你想在这种情况下将它们视为重复呢?您应该使用With...Row_Number()Over(...)
,查询将如下所示:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
以上查询将结果与发货日期一起给出结果,例如:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
请注意,这个与2016-01-01不同,2016-02-02过滤掉的原因是PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier
,而发货日期不是需要的列之一请注意重复记录,这意味着2016-02-02的记录仍然是您问题的完美结果。
现在总结一点,当你只想显示count(*)
子句中的所有列作为结果时,同时使用Group by
和Group by
子句是最好的选择,否则你会错过不参与group by
。
虽然对于With...Row_Number()Over(...)
,它适用于您想要查找重复记录的每个场景,但是,与前者相比,编写查询并且设计有点过于复杂。
如果您的目的是从表中删除重复记录,则必须使用后面的WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE
。
希望这有帮助!
答案 8 :(得分:0)
试试这个
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
答案 9 :(得分:0)
您可以使用以下方法查找输出
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
答案 10 :(得分:-2)
选择* 来自dbo.sales 以shoppername分组 有(count(Item)> 1)
答案 11 :(得分:-2)
选择EventID,将()计为cnt 来自dbo.EventInstances 按EventID分组 有计数()> 1
答案 12 :(得分:-2)
以下是运行代码:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )