如何从SQL查询中删除冗余数据

时间:2017-06-13 14:05:33

标签: sql oracle

嗨:我从具有链接到用户的主ID代码的表中提取数据,但每次用户更改他/她的名字时,都会添加一条额外的记录。我正在尝试提取当前用户列表以及他们过去可能使用的任何旧名称。我正在使用外部联接来获取至少一个前名称和一个额外的前名称。

以下是查询:

select
PrimaryName.PM_PDM,
PrimaryName.PM_ID,
PrimaryName.PM_AltID,
PrimaryName.PM_Change,
PrimaryName.PM_FName,
PrimaryName.PM_LName,
OldNames1.ON_PDM,
OldNames1.ON_Change,
OldNames1.ON_ID,
OldNames1.ON_AltID,
OldNames1.ON_FName,
OldNames1.ON_LName,
OldNames2.O2_PDM,
OldNames2.O2_Change,
OldNames2.O2_ID,
OldNames2.O2_AltID,
OldNames2.O2_FName,
OldNames2.O2_LName

from
(select
S_PDM as PM_PDM,
S_ID as PM_ID,
S_FIRST_NAME as PM_FName,
S_LAST_NAME as PM_LName,
S_CHANGE_IND as PM_Change,
S_SURROGATE_ID as PM_AltID
from S
WHERE S_CHANGE_IND is null) PrimaryName,

(select
S_PDM as ON_PDM,
S_ID  as ON_ID,
S_FIRST_NAME as ON_FName,
S_LAST_NAME as ON_LName,
S_CHANGE_IND as ON_Change,
S_SURROGATE_ID as ON_AltID
from S
where S_CHANGE_IND = 'N') OldNames1,

(select
S_PDM as O2_PDM,
S_ID  as O2_ID,
S_FIRST_NAME as O2_FName,
S_LAST_NAME as O2_LName,
S_CHANGE_IND as O2_Change,
S_SURROGATE_ID as O2_AltID
from S
where S_CHANGE_IND = 'N') OldNames2


where (OldNames1.ON_PDM = PrimaryName.pm_pdm)
and
  (OldNames1.ON_PDM = OldNames2.O2_PDM (+)
   and
   OldNames1.ON_AltID <> OldNames2.O2_AltID (+))

order by 2

以下是我的结果示例:

PM_PDM  |PM_ID  |PM_ID2 |PM_CHANGE  |PM_FNAME   |PM_LNAME   |ON_PDM |ON_CHANGE  |ON_ID  |ON_ID2 |ON_FNAME   |ON_LNAME   |O2_PDM     |O2_CHANGE  |O2_ID2 |O2_ID  |O2_FNAME   |O2_LNAME
1111    |2222   |3333   |           |Betty      |Boop       |1111   |N          |2222   |4444   |Betty      |Smith      |1111       |N          |5555   |2222   |Betty      |Jones
1111    |2222   |3333   |           |Betty      |Boop       |1111   |N          |2222   |5555   |Betty      |Jones      |1111       |N          |4444   |2222   |Betty      |Smith

我只为三个名字返回了一行:

  1. Betty Boop 2.Betty Smith 3. Betty Jones
  2. 现在,它正在返回

    1. Betty Boop 2.Betty Smith 3.Betty Jones
    2. Betty Boop 2.Betty Jones 3.Betty Smith
    3. 我知道这是最后一次加入,但我不确定如何将其限制为只有一行。查询按照预期的方式工作,但我需要将其编辑为只返回一行。

2 个答案:

答案 0 :(得分:1)

将上次加入条件更改为:

 OldNames1.ON_AltID < OldNames2.O2_AltID (+)

说明:您有两个AltID444455554444 <> 5555是真的,5555 <> 4444也是如此。因此你现有的条件......

 OldNames1.ON_AltID <> OldNames2.O2_AltID (+)

....产生交叉连接,因此您获得两条记录。由于5555 < 4444为false,因此更改要加入的条件不会消除交叉连接。

答案 1 :(得分:1)

使用row_number()为每项更改分配数字,并仅使用RN = 1RN = 2的行加入这些数据两次:

with c as (select s.*, row_number() over (partition by pdm order by id desc) rn
             from s where chg = 'N')
select s.pdm, s.id, s.name, c1.id id1, c1.name name1, c2.id id2, c2.name name2
  from (select * from s where chg is null) s
  left join c c1 on c1.pdm = s.pdm and c1.rn = 1
  left join c c2 on c2.pdm = s.pdm and c2.rn = 2

测试:

with s(pdm, id, chg, name) as (select 1, 1, 'N',  'Smith' from dual union all
                               select 1, 2, 'N',  'Jones' from dual union all
                               select 1, 3, null, 'Brown' from dual),
     c as (select s.*, row_number() over (partition by pdm order by id desc) rn
             from s where chg = 'N')
select s.pdm, s.id, s.name, c1.id id1, c1.name name1, c2.id id2, c2.name name2
  from      (select * from s where chg is null) s
  left join c c1 on c1.pdm = s.pdm and c1.rn = 1
  left join c c2 on c2.pdm = s.pdm and c2.rn = 2


PDM  ID   NAME    ID1  NAME1   ID2  NAME2
---  ---  ------  ---  ------  ---  ------
  1    3  Brown     2  Jones     1  Smith