我需要Oracle中正则表达式替换的帮助。我想替换文档中存在的所有单词或短语,只要单词或短语不存在于一组标签中。标签由我(热门html或xml)定义,我当前的概念是;
<term type=pos id=123>some phrase</term>
我为regexp_replace创建了一个函数包装器(非函数),看起来像这样;
FUNCTION ANNOTATE_ONE_TERM(IN_TEXT IN VARCHAR2, SEARCH_TERM IN VARCHAR2, TERM_TYPE IN VARCHAR2, RECORD_ID IN NUMBER) RETURN CLOB
IS
REGEX_SEARCH VARCHAR2(512);
REGEX_REPLACE VARCHAR2(512);
BEGIN
REGEX_SEARCH := '((<TERM.*?</TERM>|[^<])*?)(^|\W)('|| SEARCH_TERM ||')($|\W)';
REGEX_REPLACE := '\1 <TERM ID='|| TO_CHAR(RECORD_ID)||' TYPE=' || TERM_TYPE ||'>'|| SEARCH_TERM ||'</TERM> ';
DBMS_OUTPUT.PUT_LINE('REGEX_SEARCH = ' || REGEX_SEARCH);
DBMS_OUTPUT.PUT_LINE('REGEX_REPLACE = ' || REGEX_REPLACE);
RETURN TRIM(REGEXP_REPLACE(IN_TEXT, REGEX_SEARCH, REGEX_REPLACE,1,0,'in'));
END ANNOTATE_ONE_TERM;
当像这样调用时;
SELECT ANNOTATE_ONE_TERM(
ANNOTATE_ONE_TERM('dog elephant dog cat cat dog dogfish fishdog mouse dog', 'DOG CAT', 'POS', 123),
'DOG', 'POS',456)
FROM DUAL;
它返回;
<TERM ID=456 TYPE=POS>DOG</TERM> elephant <TERM ID=123 TYPE=POS>DOG CAT</TERM> cat <TERM ID=456 TYPE=POS>DOG</TERM> dogfish fishdog mouse <TERM ID=456 TYPE=POS>DOG</TERM>
哪个是对的。但如果用这个来调用的话;
SELECT ANNOTATE_ONE_TERM(
ANNOTATE_ONE_TERM('elephant dog cat cat dogfish fishdog mouse', 'DOG CAT', 'POS', 123),
'DOG', 'POS',456)
FROM DUAL;
它返回;
elephant <TERM ID=123 TYPE=POS <TERM ID=456 TYPE=POS>DOG</TERM> CAT</TERM> cat dogfish fishdog mouse
哪个错了。它好像在吃“&gt;”并在标签中找到单词/短语。
我正在积极尝试增加我对正则表达式的了解,但到目前为止我还没有找到这个。
答案 0 :(得分:1)
我知道您尝试匹配“消极”,我尝试使用结束标记<\TERM>
进行直接匹配,这似乎有效:
create or replace FUNCTION ANNOTATE_ONE_TERM(IN_TEXT IN VARCHAR2,
SEARCH_TERM IN VARCHAR2,
TERM_TYPE IN VARCHAR2,
RECORD_ID IN NUMBER)
RETURN CLOB IS
REGEX_SEARCH VARCHAR2(512);
REGEX_REPLACE VARCHAR2(512);
BEGIN
REGEX_SEARCH := '(?</TERM>| |^)' || SEARCH_TERM || '( |$)';
REGEX_REPLACE := '\1<TERM ID=' || TO_CHAR(RECORD_ID) || ' TYPE='
|| TERM_TYPE || '>' || SEARCH_TERM || '</TERM>\2';
RETURN TRIM(REGEXP_REPLACE(IN_TEXT, REGEX_SEARCH, REGEX_REPLACE,1,0,'in'));
END ANNOTATE_ONE_TERM;
然后我们获得:
SELECT ANNOTATE_ONE_TERM(
ANNOTATE_ONE_TERM('dog elephant dog cat cat dog dogfish fishdog mouse dog',
'DOG CAT', 'POS', 123),
'DOG', 'POS',456)
FROM DUAL;
给出:
<TERM ID=456 TYPE=POS>DOG</TERM> elephant <TERM ID=123 TYPE=POS>DOG CAT</TERM> cat <TERM ID=456 TYPE=POS>DOG</TERM> dogfish fishdog mouse <TERM ID=456 TYPE=POS>DOG</TERM>
和
SELECT ANNOTATE_ONE_TERM(
ANNOTATE_ONE_TERM('elephant dog cat cat dogfish fishdog mouse',
'DOG CAT', 'POS', 123),
'DOG', 'POS',456)
FROM DUAL;
给出:
elephant <TERM ID=123 TYPE=POS>DOG CAT</TERM> cat dogfish fishdog mouse
如预期,没有交叉条款。你必须使用这样的技巧,因为Oracle不支持超前/后看断言(至少在我的版本中,11g)。