Multi-Row function to filter out Duplicates

时间:2018-04-18 17:48:36

标签: sql oracle filter flags multirow

I'm relatively new at using SQL, So I would like your help regarding a case.

I have the following Table (just a sample):

| id | FName_LVL1  | LName_LVL1 | FName_LVL2 | LName_LVL2  |
|----|-------------|------------|------------|-------------|
| 1  | John        | Kennedy    | Marc       | Guy         |
| 2  | John        | Kennedy    | Olivier    | Oslo        |
| 3  | Mike        | Lanes      | Patrick    | James       |

I would like to isolate the duplicates in FName_LVL1 and LName_LVL1

So that the Table looks like this :

| id | FName_LVL1  | LName_LVL1 | FName_LVL2 | LName_LVL2  |
|----|-------------|------------|------------|-------------|
| 1  | John        | Kennedy    | Marc       | Guy         |
| 2  | John        | Kennedy    | Olivier    | Oslo        |

My idea was to create a flag column with a condition that IF lines Above or below in column FName_LVL1 and LName_LVL1 are the same, then put "1", else "0"

Having a Column looking like this:

| id | FName_LVL1  | LName_LVL1 | FName_LVL2 | LName_LVL2  | Flag
|----|-------------|------------|------------|-------------|
| 1  | John        | Kennedy    | Marc       | Guy         | 1
| 2  | John        | Kennedy    | Olivier    | Oslo        | 1
| 3  | Mike        | Lanes      | Patrick    | James       | 0

After having a table like this I could just filter and having the result I want to achieve.

That's a way to work I'm used to in Alteryx, but I'm not sure if this is possible using SQL statements, or even if this is the best way to tackle this case

6 个答案:

答案 0 :(得分:2)

You may use the count() with window function .

SQL Fiddle

Query 1:

SELECT t.*
    ,CASE 
        WHEN COUNT(*) OVER (
                PARTITION BY fname_lvl1
                ,lname_lvl1
                ) > 1
            THEN 1
        ELSE 0
        END AS Flag
FROM t

Results:

| ID | FNAME_LVL1 | LNAME_LVL1 | FNAME_LVL2 | LNAME_LVL2 | FLAG |
|----|------------|------------|------------|------------|------|
|  1 |       John |    Kennedy |       Marc |        Guy |    1 |
|  2 |       John |    Kennedy |    Olivier |       Oslo |    1 |
|  3 |       Mike |      Lanes |    Patrick |      James |    0 |

答案 1 :(得分:0)

The no_of_records is a column that tells you how many times the combination is present in the table. I.e. it will be 2 in your example table

select table1.*
from table as table1
inner join
(
  Select FName_LVL1, LName_LVL1, count(*) as no_of_records
  from Table
  group by FName_LVL1, LName_LVL1
) table2
  on table1.FName_LVL1 = table2.FName_LVL1
     and table1.LName_LVL1 = table2.LName_LVL1
     and no_of_records>1

答案 2 :(得分:0)

You can use "semi join" subquery to get a result like that:

SELECT * FROM Table1 t1
WHERE EXISTS (
  SELECT 'Anything' FROM Table1 t2
  WHERE t1.FName_LVL1 = t2.FName_LVL1
    AND t1.LName_LVL1 = t2.LName_LVL1
    AND t1.id <> t2.id
)

Demo: http://sqlfiddle.com/#!4/f9c44/3

| ID | FNAME_LVL1 | LNAME_LVL1 | FNAME_LVL2 | LNAME_LVL2 |
|----|------------|------------|------------|------------|
|  2 |       John |    Kennedy |    Olivier |       Oslo |
|  1 |       John |    Kennedy |       Marc |        Guy |

答案 3 :(得分:0)

您可能更喜欢使用LAG&amp; LEAD分析函数,贡献为NVL2

select n.*,
       nvl2(lag(FName_LVL1||' '||LName_LVL1,1,null) over 
       (partition by FName_LVL1||' '||LName_LVL1 order by FName_LVL1, LName_LVL1),1,0)+
       nvl2(lead(FName_LVL1||' '||LName_LVL1,1,null) over 
       (partition by FName_LVL1||' '||LName_LVL1 order by FName_LVL1, LName_LVL1),1,0) flag
  from names n;

ID FNAME_LVL1   LNAME_LVL1  FNAME_LVL2  LNAME_LVL2  FLAG
--  ----------  ----------  ----------  ----------  -----
1    John        Kennedy      Marc        Guy         1
2    John        Kennedy      Olivier     Oslo        1
3    Mike        Lanes        Patrick     James       0

SQL Fiddle Demo

答案 4 :(得分:0)

最有效的方法是使用partition by子句只进行一次表扫描。 我已将输出保存在Livesql

drop table t1 purge;
      create table t1 ( c1 varchar2(20), c2 varchar2(20), c3 varchar2(20), c4 varchar2(20));
      insert into t1 values ('John','Kennedy','Marc','Guy');
      insert into t1 values ('John','Kennedy','Olivier','Oslo');
      insert into t1 values ('not','john','vijay','balebail');
      commit;
      select t1.*, count(c1||c2) over (partition by c1,c2 order by c1,c2  ) flag from t1;
      select t1.*, decode (count(c1||c2) over (partition by c1,c2 order by c1,c2  ),1,0,1) flag from t1;

C1 C2 C3 C4 FLAG John Kennedy Marc Guy 2 John Kennedy Olivier Oslo 2 不是john vijay balebail 1 下载CSV 选择了3行。 声明7        选择t1。*,解码(count(c1 || c2)over(c1分区,c2顺序c1,c2),1,0,1)来自t1的标志

C1      C2      C3      C4       FLAG
John    Kennedy Marc    Guy         1
John    Kennedy Olivier Oslo        1
not     john    vijay   balebail    0

答案 5 :(得分:0)

谢谢大家!对于这种情况,似乎确实有很多解决方案!

我会深入研究它,看看我最喜欢的是什么,但多亏了你,它让我对SQL逻辑有了很好的认识

很抱歉我的回复延迟了,不能上班