从PL / SQL中的字符串中删除LEADING和TRAILING关键字

时间:2014-09-07 12:57:09

标签: regex string oracle function plsql

我需要从输入字符串中删除某些关键字并返回新字符串。关键字存储在另一个表格中,例如MR, MRS, DR, PVT, PRIVATE, CO, COMPANY, LTD, LIMITED等。它们是两种关键词LEADING - MR, MRS, DR和TRAILING - PVT, PRIVATE, CO, COMPANY, LTD, LIMITED等。

因此,如果关键字是LEADING,那么我们必须从头开始删除它,如果它是TRAILING,那么我们必须从最后删除它。例如 - MR Jones MRS COMPANY应返回JONES MRSMR MRS Jones PVT COMPANY应返回MRS JONES PVT(在第一次迭代中,MR和PVT将被修剪,然后单词将变为MRS JONES PVT)应该只在输入字符串的开头或结尾处删除第一次出现的reserve关键字,因此LEADING关键字在开头有多次出现它应该只删除第一个而不是其他人,就像上面给出的例子,它也适用于TRAILING关键字。

我已经编写了下面的函数,它工作正常,但效率不高,我相信这个性能可以提高很多(可能使用正则表达式)。以下是功能:

CREATE OR REPLACE FUNCTION replace_keyword (p_in_name IN VARCHAR2)
RETURN VARCHAR2
 IS
 l_name   VARCHAR2 (4000);
 l_keyword_found BOOLEAN;

  CURSOR c IS
  SELECT *
    FROM RSRV_KEY_WORDS
   WHERE ACTIVE = 'Y'
   AND upper(POSITION)  in ('LEADING', 'TRAILING'); 

 BEGIN
 --Remove the leading and trailing blank spaces
 l_name := TRIM (UPPER (p_in_name)); 


 --remove LEADING keywords
   l_keyword_found := false;
   for rec in c LOOP
        IF     UPPER (rec.POSITION) = 'LEADING'
         AND SUBSTR(l_name, 1,INSTR(l_name,' ',1) - 1) = rec.key_word 
         AND l_keyword_found = false
        THEN 
            l_name := SUBSTR(l_name,INSTR(l_name,' ',1)+1);
            l_keyword_found := true;
        END IF;
        EXIT  WHEN (l_keyword_found);
   END LOOP;

 --Remove multiple spaces in a word and replace with single blank space
   l_name := REGEXP_REPLACE (l_name, '[[:space:]]{2,}', ' '); 
 --Remove the leading and trailing blank spaces
   l_name := TRIM (l_name);  

 --remove TRAILING keywords
   l_keyword_found := false;
   for rec in c LOOP
        IF     UPPER (rec.POSITION) = 'TRAILING'
         AND SUBSTR(l_name, INSTR(l_name,' ',-1) + 1) = rec.key_word
         AND l_keyword_found = false
        THEN 
            l_name := SUBSTR(l_name,1,INSTR(l_name,' ',-1)-1);  
            l_keyword_found := true;
        END IF;
        EXIT  WHEN (l_keyword_found);
   END LOOP;
 --Remove multiple spaces in a word and replace with single blank space
   l_name := REGEXP_REPLACE (l_name, '[[:space:]]{2,}', ' '); 
 --Remove the leading and trailing blank spaces
   l_name := TRIM (l_name); 
   return l_name;
 EXCEPTION
   WHEN OTHERS
   THEN
      raise_application_error (
         -20001,
         'An error was encountered - ' || SQLCODE || ' -ERROR- ' || SQLERRM);
 END;
/

1 个答案:

答案 0 :(得分:0)

我真的不能说这会更快,但我会尝试一下:

假设RSRV_KEY_WORDS中的关键字不经常更改,我会创建一个函数来从表中生成正则表达式,并让Oracle缓存结果:

create or replace function get_lead_and_trail_regexp return varchar2 
  result_cache relies_on (RSRV_KEY_WORDS) is
declare
   CURSOR c IS
     SELECT ( SELECT listagg(key_word,'|') within group (order by 1)
              FROM   RSRV_KEY_WORDS
              WHERE  ACTIVE = 'Y'
              AND    upper(POSITION) = 'LEADING' ) as leading,
            ( SELECT listagg(key_word,'|') within group (order by 1)
              FROM   RSRV_KEY_WORDS
              WHERE  ACTIVE = 'Y'
              AND    upper(POSITION) = 'TRAILING' ) as trailing
     FROM dual;
begin
  for rec in c loop
    return '(^[ ]+(('||rec.leading||')[ ]+))|([ ]+(('||rec.trailing||'||)[ ]+)$)';
  end loop;
  return null; -- Not very likely
end get_lead_and_trail_regexp;

然后,您可以使用正则表达式在一个笔划中删除第一个前导和第一个尾随关键字:

l_name := REGEXP_REPLACE (l_name, get_lead_and_trail_regexp , ' ');

然后携带一个删除任何重复的空格。

我已经使用java.lang.String.replaceAll测试了正则表达式,因为我目前没有可用的Oracle数据库,但我相信它也适用于REGEXP_REPLACE。