PL / SQL。使用regexp_like正则表达式解析clob UTF8字符

时间:2018-05-08 11:01:19

标签: regex oracle plsql

我想检查我的clob的任何一行是否有奇怪的字符,如(ç)。这些字符是从具有意外编码(UTF-8)的csv文件中读取的,该编码会转换其中的一些。

我尝试使用正则表达式过滤每一行,但它没有按预期工作。有没有办法在读取时知道csv文件的编码?

如何修复正则表达式以允许只包含这些字符的行? a-zA-Z 0-9 .,;:"'()-_&空格标签。

来自csv的Clob示例:

  l_clob clob :='
"exp","objc","objc","OBR","031110-5","S","EXAMPLE","NAME","08/03/2018",,"122","3","12,45"
 "xp","objc","obj","OBR","031300-5","S","EXAMPLE","NAME","08/03/2018",,"0","0","0"
';

另一个clob:

DECLARE
    l_clob   CLOB
        := '"exp","objc","objc","OBR","031110-5","S","EXAMPLE","NAME","08/03/2018",,"122","3","12,45"
             "xp","objc","obj","OBR","031300-5","S","EXAMPLE","NAME","08/03/2018",,"0","0","0"';
    l_offset             PLS_INTEGER := 1;
    l_line               VARCHAR2 (32767);
    csvregexp   CONSTANT VARCHAR2 (1000)
        := '^([''"]+[-&\s(a-z0-9)]*[''"]+[,:;\t\s]?)?[''"]+[-&\s(a-z0-9)]*[''"]+' ;
    l_total_length       PLS_INTEGER := LENGTH (l_clob);
    l_line_length        PLS_INTEGER;
BEGIN

    WHILE l_offset <= l_total_length
    LOOP
        l_line_length := INSTR (l_clob, CHR (10), l_offset) - l_offset;

        IF l_line_length < 0
        THEN
            l_line_length := l_total_length + 1 - l_offset;
        END IF;

        l_line := SUBSTR (l_clob, l_offset, l_line_length);

        IF REGEXP_LIKE (l_line, csvregexp, 'i')
        THEN                                   -- i (case insensitive matches)
            DBMS_OUTPUT.put_line ('Ok');
            DBMS_OUTPUT.put_line (l_line);
        ELSE
            DBMS_OUTPUT.put_line ('Error');
            DBMS_OUTPUT.put_line (l_line);
        END IF;

        l_offset := l_offset + l_line_length + 1;
    END LOOP;
END;

0 个答案:

没有答案