Oracle - 通过正则表达式列表替换记录的子字符串

时间:2015-02-26 13:49:53

标签: sql regex oracle plsql

我想清理一个包含字母数字(varchar2)记录的表格。应检查每个单个记录是否包含一组脏字,如果是,则应替换掉。模式和替换存储在单独的表中。

示例:

create table to_clean (
text_id number,
dirty_text  varchar2(4000));

insert into to_clean values(1,'hello this is my dirtyword1 text.');
insert into to_clean values(2,'hello this is my dirtyword3 text.');
insert into to_clean values(3,'hello this is my dirtyword2 dirtyword1  text.');

create table regex_list(
pattern varchar2(400),
replacement varchar2(400));

insert into regex_list values('dirtyword1','clean1');
insert into regex_list values('dirtyword2',' '); --remove totally
insert into regex_list values('dirtyword3','clean3');

伪代码:

for each dirty_text in to_clean
    for pattern, replacement in regexlist
        regex_replace(dirty_text, pattern, replacement)

在Oracle中解决此问题的最佳方法是什么? regex_list包含正则表达式和普通字符串作为模式。我只想替换完整的单词,而不是部分单词(用空格分隔)

1 个答案:

答案 0 :(得分:2)

试试这个:

UPDATE to_clean c
   SET dirty_text =
          (SELECT REGEXP_REPLACE (dirty_text, pattern, replacement) replaced
             FROM    regex_list r
                  INNER JOIN
                     (SELECT t.*,
                             (SELECT pattern
                                FROM regex_list
                               WHERE INSTR (T.DIRTY_TEXT, pattern) <> 0)
                                find_pat
                        FROM to_clean t) s
                  ON (r.pattern = s.find_pat)
            WHERE c.dirty_text = dirty_text);

如果你有表达,而不是单词,instr不起作用,那么使用regexp_like(正如Justin Cave所说):

SELECT REGEXP_REPLACE (dirty_text, pattern, replacement) replaced
             FROM    regex_list r
                  INNER JOIN
                     (SELECT t.*,
                             (SELECT pattern
                                FROM regex_list
                               WHERE regexp_like(T.DIRTY_TEXT,pattern) )
                                find_pat
                        FROM to_clean t) s
                  ON (r.pattern = s.find_pat) 

修改

在这种情况下,您可以使用plsql。请看一下:

--Create oracle objects
create or replace type clean_o as object(text_id number,dirty_text varchar2(500));
create or replace type clean_t as table of clean_o; 

--Function
create or replace function clean_text return clean_t pipelined is 
    cursor clean_c is select * from to_clean;
    text varchar2(250);
begin
    for c in clean_c loop
       text:= c.dirty_text; 
       for i in (select * from regex_list) loop
          text:= regexp_replace(text,i.pattern,i.replacement);  
       end loop;
       PIPE ROW (clean_o(c.text_id,text));
    end loop;
end;

现在你可以这样做:

select * from table(clean_text)