选择ColA = ColB的行,并添加一个新列,告知ColA中的原始数据是否重复

时间:2011-04-19 20:10:41

标签: sql oracle group-by case having

我试图弄清楚如何查询表(该表实际上是一个结果集,因此它将是一个子查询),按ColA=ColB分组(见下文),并创建一个计算字段全部一步到位。

所以,如果我的测试数据看起来像

ColA    ColB    ColC 
1       1       aaa 
1       2       bbbb    
1       3       cccc    
2       2       dddd    
3       3       eeee    
3       4       ffff    
3       5       gggg    
3       6       hhhh    
4       4       iiii    
5       5       jjjj    
6       6       kkkk    
6       7       llll    
6       8       mmmm    

我只想检索ColA=ColB的行,并添加一个新列,告诉我ColA中的原始数据是否重复。见下文。

ColA   ColB    ColC    multiples
1      1       aaaa    yes
2       2       dddd    no
3       3       eeee    yes
4       4       iiii    no
5       5       jjjj    no
6       6       kkkk    yes

有人可以帮我解决语法问题吗?我一直在玩Group By和SubSelect无济于事。我是否需要对多个字段使用case语句?

4 个答案:

答案 0 :(得分:7)

发布create table和insert语句而不是Desc表更有帮助,并从table_name中选择*; http://tkyte.blogspot.com/2005/06/how-to-ask-questions.html

create table test_repeat(
   cola number,
   colb number,
   colc varchar2(20)
   );

insert into test_repeat values (1,1,'aaa'); 
insert into test_repeat values (1,2,'bbbb');    
insert into test_repeat values (1,3,'cccc');    
insert into test_repeat values (2,2,'dddd');    
insert into test_repeat values (3,3,'eeee'); 
insert into test_repeat values (3,4,'ffff');    
insert into test_repeat values (3,5,'gggg');    
insert into test_repeat values (3,6,'hhhh');    
insert into test_repeat values (4,4,'iiii');    
insert into test_repeat values (5,5,'jjjj');    
insert into test_repeat values (6,6,'kkkk');    
insert into test_repeat values (6,7,'llll');    
insert into test_repeat values (6,8,'mmmm');
commit;

1。您可以使用Oracle分析函数Lead查看结果集,以查看colA是否与下一行相同(在订购之后......),如..

select * from
(select colA, colb,
       (case when colA = (lead(cola) over 
                          (partition by colA order by cola, colb))
            then 'Yes'
            else 'No'
       end) multiples,
       colc
  from test_repeat)
  where colA = colb
/

      COLA       COLB MUL COLC
---------- ---------- --- --------------------
         1          1 Yes aaa
         2          2 No  dddd
         3          3 Yes eeee
         4          4 No  iiii
         5          5 No  jjjj
         6          6 Yes kkkk

2。或者你可以得到COLA每个值的计数并比较它以查看是否有重复...

select a.colA, a.colb, a.colc,
       (case when (select count(*) from test_repeat t where t.cola = a.colA) > 1
            then 'Yes'
            else 'No'
       end) Repeat
  from test_repeat a
  where colA = colB
/

      COLA       COLB COLC                 REP
---------- ---------- -------------------- ---
         1          1 aaa                  Yes
         2          2 dddd                 No
         3          3 eeee                 Yes
         4          4 iiii                 No
         5          5 jjjj                 No
         6          6 kkkk                 Yes

它们都同样简单,但是我建议使用分析函数方法,因为我发现它通常比我过去使用的所有查询更快。

答案 1 :(得分:1)

SQL> select *
  2    from test_repeat
  3   order by cola
  4  /

      COLA       COLB COLC
---------- ---------- --------------------
         1          2 bbbb
         1          1 aaa
         1          3 cccc
         2          2 dddd
         3          4 ffff
         3          3 eeee
         3          5 gggg
         3          6 hhhh
         4          4 iiii
         5          5 jjjj
         6          6 kkkk
         6          7 llll
         6          8 mmmm
         7          9 nnnn

14 rows selected.

SQL> select cola "ColA"
  2       , max(decode(colb,cola,colb)) "ColB"
  3       , max(decode(colb,cola,colc)) "ColC"
  4       , case count(*) when 1 then 'no' else 'yes' end "multiples"
  5    from test_repeat
  6   group by cola
  7  having cola = max(decode(colb,cola,colb))
  8   order by cola
  9  /

      ColA       ColB ColC                 mul
---------- ---------- -------------------- ---
         1          1 aaa                  yes
         2          2 dddd                 no
         3          3 eeee                 yes
         4          4 iiii                 no
         5          5 jjjj                 no
         6          6 kkkk                 yes

6 rows selected.

Performancewise,这个SQL大致等于Rajesh的第一个查询。所以你可以选择一个你更舒服的那个。

SQL> set autotrace on
SQL> select * from
  2  (select colA, colb,
  3         (case when colA = (lead(cola) over
  4                            (partition by colA order by cola, colb))
  5              then 'Yes'
  6              else 'No'
  7         end) multiples,
  8         colc
  9    from test_repeat)
 10    where colA = colb
 11  /

      COLA       COLB MUL COLC
---------- ---------- --- --------------------
         1          1 Yes aaa
         2          2 No  dddd
         3          3 Yes eeee
         4          4 No  iiii
         5          5 No  jjjj
         6          6 Yes kkkk

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1491815685

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   574 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |             |    14 |   574 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"="COLB")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
        421  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> /

      COLA       COLB MUL COLC
---------- ---------- --- --------------------
         1          1 Yes aaa
         2          2 No  dddd
         3          3 Yes eeee
         4          4 No  iiii
         5          5 No  jjjj
         6          6 Yes kkkk

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1491815685

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   574 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |             |    14 |   574 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"="COLB")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        421  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> select cola "ColA"
  2       , max(decode(colb,cola,colb)) "ColB"
  3       , max(decode(colb,cola,colc)) "ColC"
  4       , case count(*) when 1 then 'no' else 'yes' end "multiples"
  5    from test_repeat
  6   group by cola
  7  having cola = max(decode(colb,cola,colb))
  8   order by cola
  9  /

      ColA       ColB ColC                 mul
---------- ---------- -------------------- ---
         1          1 aaa                  yes
         2          2 dddd                 no
         3          3 eeee                 yes
         4          4 iiii                 no
         5          5 jjjj                 no
         6          6 kkkk                 yes

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3021378319

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   532 |     4  (25)| 00:00:01 |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   SORT GROUP BY     |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"=MAX(DECODE("COLB","COLA","COLB")))

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        421  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> /

      ColA       ColB ColC                 mul
---------- ---------- -------------------- ---
         1          1 aaa                  yes
         2          2 dddd                 no
         3          3 eeee                 yes
         4          4 iiii                 no
         5          5 jjjj                 no
         6          6 kkkk                 yes

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3021378319

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |    14 |   532 |     4  (25)| 00:00:01 |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   SORT GROUP BY     |             |    14 |   532 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_REPEAT |    14 |   532 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COLA"=MAX(DECODE("COLB","COLA","COLB")))

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        420  bytes sent via SQL*Net to client
        238  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          6  rows processed

SQL> set autotrace off

的问候,
罗布。

答案 2 :(得分:0)

我认为您需要multiples列的子选择。不需要分组。写下类似的东西:

select ColA, ColB, ColC, 
       (CASE
        WHEN (select b.ColA from thistable b where b.ColA = a.ColA) > 1 THEN 'yes'
        ELSE 'no'
        END)
from thistable a
where ColA = ColB

答案 3 :(得分:0)

我不知道领导功能,非常好。 Rob和Rajesh,感谢到目前为止的答案,我想我能够使用两个子查询和一个外连接来部分回答我自己的问题。

我猜我的查询不如其他人发布的查询效率高,但我想知道人们是否可以对此查询发表评论。它会一直有效吗?可以调整它以提高效率吗? 它仍然需要为倍数输入'yes'或'no',现在它将多重计数或0表示为空。

SELECT BB.ColA, ColB, ColC, AA.CNT 
FROM
(SELECT ColA, count(*) as CNT FROM TMPY where ColA<>ColB Group by ColA)  AA, 
(SELECT ColA, ColB, ColC FROM TMPY WHERE COLA=COLB ) BB 
WHERE
AA.COLa(+)=BB.COLB order by bb.ColA;

再次感谢你!