如何在同一个表中获得记录差异的计数,其中存在明显且几乎不同的记录

时间:2013-01-08 17:27:41

标签: sql oracle distinct netezza

我有一张表TABLEA,其数据如下

field1 field2 field3.......field16
123    10-JAN-12 0.8.......ABC
123    10-JAN-12 0.8.......ABC
.
.
.
123    10-JAN-12 0.7.......ABC
245    11-JAN-12 0.3.......CDE
245    11-JAN-12 0.3.......CDE
245    11-JAN-12 0.3.......XYZ
...
<unique rows>

我做的时候

select field1, field2, ...field16 
  from TABLEA

我获得M记录,当我做

select distinct field1, field2...field16 
  from TABLEA

我获得了M-x条记录,其中M位于百万条中,x是一个小得多的#。

我正在尝试编写SQL来获取x记录(最终,只是得到计数)。 我已经尝试了所有Set运算符关键字,如

select field1...field16 
 from TABLEA 
 EXCEPT 
 select distinct field1..field16 
   from TABLEA  

或使用UNION ALL代替EXCEPT。但是没有一个返回x,而是返回0行。

3 个答案:

答案 0 :(得分:3)

您可以选择

不相同的行
 SELECT field1, ... , field16
   FROM tablea
  GROUP BY field1, ... , field16
 HAVING count(*) > 1

编辑:另一种方法是使用分析函数ROW_NUMBER(),按所有field列进行分区。给定字段集的第一行(即不同的)行ROW_NUMBER = 1,第二行= 2,第三行= 3等。因此,您可以选择x - 行WHERE ROW_NUMBER > 1

CREATE TABLE tablea (
    field1 NUMBER, field2 DATE,  field3 NUMBER, field16 VARCHAR2(10)
);

INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.8, 'ABC');
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.8, 'ABC');
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.7, 'ABC');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'CDE');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'CDE');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'XYZ');

选择重复的行x

SELECT *
  FROM (
        SELECT field1, field2, field3, field16,
               ROWID AS rid,
               ROW_NUMBER() OVER (PARTITION BY 
               field1, field2, field3, field16 ORDER BY ROWID) as rn
          FROM tablea
        )
  WHERE rn > 1;

 123 10.01.2012 0.8 ABC AAAJ6mAAEAAAAExAAB 2
 245 11.01.2012 0.3 CDE AAAJ6mAAEAAAAExAAE 2

答案 1 :(得分:1)

您将通过上面发布的自己的“除外”查询得到您想要的内容。但是您必须在您的“ALL”关键字中加入“ALL”关键字,因为“Except Distinct”是默认值。所以我刚刚在您的查询中添加了以下ALL关键字:

选择field1 ... field16  来自TABLEA  除 所有 外  选择distinct field1..field16    来自TABLEA

如果你想要计算M-x的记录,那么在另一个查询的FROM子句中将上面的查询作为一个子查询,并计算该外部查询,你将得到如下所示的计数:

选择计数(*) 从 ( 选择field1 ... field16  来自TABLEA  除了所有  选择distinct field1..field16    来自TABLEA
)B

猜猜这就是你要找的东西。

祝你好运

答案 2 :(得分:0)

如果列选项相同,则不会计算不在您的distinct中的行结果。区别在于显示所有结果的'DISTINCT'可能性,所以做一个联合所有只是重复它,除了你不会找到任何东西,因为你限制你的行。你还想做什么?试着计算区别发生的位置?你从沃尔夫冈那里得到的答案已经做到了。

declare @Table Table ( personID int identity, person varchar(8));

insert into @Table values ('Brett'),('Brett'),('Brett'),('John'),('John'),('Peter');


-- gives me all results
select person
from @Table

-- gives me distinct results (no repeats)
Select distinct person
from @Table


-- gives me nothing as nothing exists that is distinct that is not in total
select person
from @Table 
except 
select distinct person
from @Table

-- shows me counts of rows repeated by pivoting on one column and counting resultant rows from that.  Having clause adds predicate specific logic to hunt for.
-- in this case duplicates or rows greater than one
Select person, count(*)
from @Table 
group by person
having count(*) > 1 

编辑你可以得到与总数不同的差异,如果这就是你的意思:

 with dupes as 
    (
    Select count(*) as cnts, sum(count(*)) over() as TotalDupes
    from @Table 
    group by person 
    having count(*) > 1 -- dupes are defined by rows repeating 
    ) 
, uniques as 
    (
    Select count(*) as cnts, sum(count(*)) over() as TotalUniques
    from @Table 
    group by person 
    having count(*) = 1  -- non dupes are rows of only a single resulting row
    )
select distinct TotalDupes - TotalUniques as DifferenceFromRepeatsToUnqiues
from Dupes, Uniques