使用正则表达式从Oracle表中删除重复的单词/字符串

时间:2015-11-27 14:54:54

标签: sql regex oracle oracle11g

我想从Col B删除重复的字符串。例如:“New Cap Grp”在第二条记录中重复五次。

Col A   Col B
-----   -----
WDSA    ALT COMPANY, III & New Group
1101    New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp 
2255    Tata Associates Inc. & Tata Associates Inc.& Towers Watson 
3355    Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc. 
8877    Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd

我是正则表达式的新手,所以我无法弄清楚这是如何实现的。如果有人知道如何处理这种情况,请帮助我。

1 个答案:

答案 0 :(得分:1)

我认为仅使用regexp expresion是不可能的,因为您必须更新 Col B * 值。

PL / SQL 更容易,我尝试这样做:

为测试数据创建表

create table test
    (
        id   number,
        text varchar2(100)
    );

插入测试数据

insert into test values (1, 'ALT COMPANY, III & New Group');
insert into test values (2, 'New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp');
insert into test values (3, 'Tata Associates Inc. & Tata Associates Inc.& Towers Watson');
insert into test values (4, 'Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc.');
insert into test values (5, 'Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd');

PL / SQL块:

declare
    l_new_column_value varchar2(1024) := '';
begin
    -- go on all row
    for x in (select id, text from test)
    loop
        -- work with each row, do from one row several by separation symbol '&' and take distinct value
        for concat_text in (
            select distinct trim(regexp_substr(text, '[^&]+', 1, level)) as part_value
            from
                (
                    select text
                    from test
                    where id = x.id
                )
            connect by instr(text, '&', 1, level - 1) > 0)
        loop
            -- formiration new uniq value 
            l_new_column_value := l_new_column_value || concat_text.part_value || ' & ';
        end loop;
        -- undate raw data
        update test
            set text = substr(l_new_column_value, 0, length(l_new_column_value)-3)
        where id = x.id;
        l_new_column_value := '';
    end loop;
end;