我需要从输入字符串中删除某些关键字并返回新字符串。关键字存储在另一个表格中,例如MR, MRS, DR, PVT, PRIVATE, CO, COMPANY, LTD, LIMITED
等。它们是两种关键词LEADING - MR, MRS, DR
和TRAILING - PVT, PRIVATE, CO, COMPANY, LTD, LIMITED
等。
因此,如果关键字是LEADING,那么我们必须从头开始删除它,如果它是TRAILING,那么我们必须从最后删除它。例如 - MR Jones MRS COMPANY
应返回JONES MRS
而MR MRS Jones PVT COMPANY
应返回MRS JONES PVT
(在第一次迭代中,MR和PVT将被修剪,然后单词将变为MRS JONES PVT
)应该只在输入字符串的开头或结尾处删除第一次出现的reserve关键字,因此LEADING关键字在开头有多次出现它应该只删除第一个而不是其他人,就像上面给出的例子,它也适用于TRAILING关键字。
我已经编写了下面的函数,它工作正常,但效率不高,我相信这个性能可以提高很多(可能使用正则表达式)。以下是功能:
CREATE OR REPLACE FUNCTION replace_keyword (p_in_name IN VARCHAR2)
RETURN VARCHAR2
IS
l_name VARCHAR2 (4000);
l_keyword_found BOOLEAN;
CURSOR c IS
SELECT *
FROM RSRV_KEY_WORDS
WHERE ACTIVE = 'Y'
AND upper(POSITION) in ('LEADING', 'TRAILING');
BEGIN
--Remove the leading and trailing blank spaces
l_name := TRIM (UPPER (p_in_name));
--remove LEADING keywords
l_keyword_found := false;
for rec in c LOOP
IF UPPER (rec.POSITION) = 'LEADING'
AND SUBSTR(l_name, 1,INSTR(l_name,' ',1) - 1) = rec.key_word
AND l_keyword_found = false
THEN
l_name := SUBSTR(l_name,INSTR(l_name,' ',1)+1);
l_keyword_found := true;
END IF;
EXIT WHEN (l_keyword_found);
END LOOP;
--Remove multiple spaces in a word and replace with single blank space
l_name := REGEXP_REPLACE (l_name, '[[:space:]]{2,}', ' ');
--Remove the leading and trailing blank spaces
l_name := TRIM (l_name);
--remove TRAILING keywords
l_keyword_found := false;
for rec in c LOOP
IF UPPER (rec.POSITION) = 'TRAILING'
AND SUBSTR(l_name, INSTR(l_name,' ',-1) + 1) = rec.key_word
AND l_keyword_found = false
THEN
l_name := SUBSTR(l_name,1,INSTR(l_name,' ',-1)-1);
l_keyword_found := true;
END IF;
EXIT WHEN (l_keyword_found);
END LOOP;
--Remove multiple spaces in a word and replace with single blank space
l_name := REGEXP_REPLACE (l_name, '[[:space:]]{2,}', ' ');
--Remove the leading and trailing blank spaces
l_name := TRIM (l_name);
return l_name;
EXCEPTION
WHEN OTHERS
THEN
raise_application_error (
-20001,
'An error was encountered - ' || SQLCODE || ' -ERROR- ' || SQLERRM);
END;
/
答案 0 :(得分:0)
我真的不能说这会更快,但我会尝试一下:
假设RSRV_KEY_WORDS中的关键字不经常更改,我会创建一个函数来从表中生成正则表达式,并让Oracle缓存结果:
create or replace function get_lead_and_trail_regexp return varchar2
result_cache relies_on (RSRV_KEY_WORDS) is
declare
CURSOR c IS
SELECT ( SELECT listagg(key_word,'|') within group (order by 1)
FROM RSRV_KEY_WORDS
WHERE ACTIVE = 'Y'
AND upper(POSITION) = 'LEADING' ) as leading,
( SELECT listagg(key_word,'|') within group (order by 1)
FROM RSRV_KEY_WORDS
WHERE ACTIVE = 'Y'
AND upper(POSITION) = 'TRAILING' ) as trailing
FROM dual;
begin
for rec in c loop
return '(^[ ]+(('||rec.leading||')[ ]+))|([ ]+(('||rec.trailing||'||)[ ]+)$)';
end loop;
return null; -- Not very likely
end get_lead_and_trail_regexp;
然后,您可以使用正则表达式在一个笔划中删除第一个前导和第一个尾随关键字:
l_name := REGEXP_REPLACE (l_name, get_lead_and_trail_regexp , ' ');
然后携带一个删除任何重复的空格。
我已经使用java.lang.String.replaceAll测试了正则表达式,因为我目前没有可用的Oracle数据库,但我相信它也适用于REGEXP_REPLACE。