我想从Col B
删除重复的字符串。例如:“New Cap Grp”在第二条记录中重复五次。
Col A Col B
----- -----
WDSA ALT COMPANY, III & New Group
1101 New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp
2255 Tata Associates Inc. & Tata Associates Inc.& Towers Watson
3355 Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc.
8877 Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd
我是正则表达式的新手,所以我无法弄清楚这是如何实现的。如果有人知道如何处理这种情况,请帮助我。
答案 0 :(得分:1)
我认为仅使用regexp expresion是不可能的,因为您必须更新 Col B * 值。
PL / SQL 更容易,我尝试这样做:
create table test
(
id number,
text varchar2(100)
);
insert into test values (1, 'ALT COMPANY, III & New Group');
insert into test values (2, 'New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp');
insert into test values (3, 'Tata Associates Inc. & Tata Associates Inc.& Towers Watson');
insert into test values (4, 'Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc.');
insert into test values (5, 'Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd');
declare
l_new_column_value varchar2(1024) := '';
begin
-- go on all row
for x in (select id, text from test)
loop
-- work with each row, do from one row several by separation symbol '&' and take distinct value
for concat_text in (
select distinct trim(regexp_substr(text, '[^&]+', 1, level)) as part_value
from
(
select text
from test
where id = x.id
)
connect by instr(text, '&', 1, level - 1) > 0)
loop
-- formiration new uniq value
l_new_column_value := l_new_column_value || concat_text.part_value || ' & ';
end loop;
-- undate raw data
update test
set text = substr(l_new_column_value, 0, length(l_new_column_value)-3)
where id = x.id;
l_new_column_value := '';
end loop;
end;