Oracle RegExp-在字符串字段中处理双引号

时间:2020-10-09 11:51:26

标签: regex oracle csv

Oracle REGEXP_SUBSTR函数在最近几个月对我来说一直很好。但是突然,内容如下:

"field 1 "",""","field 2","field 3", "","","","field 7"

在这种情况下,预期的匹配信息为(https://regex101.com/r/s2v60b/1):

field 1: "field 1 "","""
field 2: "field 2"
field 3: "field 3"
field 4: ""
field 5: ""
field 6: ""
field 7: "field 7"

即使VS Code知道我的意思,因为它可以按颜色正确分割字段: enter image description here

但是当我使用查询在Oracle中评估以下命令时:

SELECT 
    REGEXP_SUBSTR(
      '"field 1 "",""","field 2","field 3", "","","","field 7"' 
    , '(^|,)("((?:""|[^"])*)")', 1, 1, '', 2) TEXT 
FROM DUAL;

field 1被截断为"field 1 ",移动其余字段。

您知道我在做什么错,也许可以纠正吗?

1 个答案:

答案 0 :(得分:1)

Oracle不支持非捕获组,因此您不能使用(?:)只是删除?:使其成为捕获组,您的代码应该可以工作(您可能需要添加{{1 }},因为逗号和起始引号\s*之间有空格。

例如:

(^|,)\s*("((""|[^"])*)")

输出,用于您的测试数据:

FIELD1          | FIELD2    | FIELD3          | FIELD4              | FIELD5          | FIELD6 | FIELD7   
:-------------- | :-------- | :-------------- | :------------------ | :-------------- | :----- | :--------
"field 1 "",""" | "field 2" | "field 3"       | ""                  | ""              | ""     | "field 7"

如果要匹配带引号和不带引号的值,可以使用:

SELECT REGEXP_SUBSTR( csv, '(^|,)\s*("((""|[^"])*)")', 1, 1, NULL, 2 ) AS field1,
       REGEXP_SUBSTR( csv, '(^|,)\s*("((""|[^"])*)")', 1, 2, NULL, 2 ) AS field2,
       REGEXP_SUBSTR( csv, '(^|,)\s*("((""|[^"])*)")', 1, 3, NULL, 2 ) AS field3,
       REGEXP_SUBSTR( csv, '(^|,)\s*("((""|[^"])*)")', 1, 4, NULL, 2 ) AS field4,
       REGEXP_SUBSTR( csv, '(^|,)\s*("((""|[^"])*)")', 1, 5, NULL, 2 ) AS field5,
       REGEXP_SUBSTR( csv, '(^|,)\s*("((""|[^"])*)")', 1, 6, NULL, 2 ) AS field6,
       REGEXP_SUBSTR( csv, '(^|,)\s*("((""|[^"])*)")', 1, 7, NULL, 2 ) AS field7
FROM   table_name

其中的示例数据:

SELECT REGEXP_SUBSTR( csv, '([^",]*|"([^"]|"")*")(,|$)', 1, 1, NULL, 1 ) AS field1,
       REGEXP_SUBSTR( csv, '([^",]*|"([^"]|"")*")(,|$)', 1, 2, NULL, 1 ) AS field2,
       REGEXP_SUBSTR( csv, '([^",]*|"([^"]|"")*")(,|$)', 1, 3, NULL, 1 ) AS field3,
       REGEXP_SUBSTR( csv, '([^",]*|"([^"]|"")*")(,|$)', 1, 4, NULL, 1 ) AS field4,
       REGEXP_SUBSTR( csv, '([^",]*|"([^"]|"")*")(,|$)', 1, 5, NULL, 1 ) AS field5,
       REGEXP_SUBSTR( csv, '([^",]*|"([^"]|"")*")(,|$)', 1, 6, NULL, 1 ) AS field6,
       REGEXP_SUBSTR( csv, '([^",]*|"([^"]|"")*")(,|$)', 1, 7, NULL, 1 ) AS field7
FROM   table_name

输出:

FIELD1          | FIELD2    | FIELD3    | FIELD4          | FIELD5   | FIELD6              | FIELD7         
:-------------- | :-------- | :-------- | :-------------- | :------- | :------------------ | :--------------
"field 1 "",""" | "field 2" | "field 3" | ""              | ""       | ""                  | "field 7"      
"field 1.1"     | 2.1       | "3.1"     | "field ""4"".1" | field5.1 | "field ""6"".""1""" | """field 7.1"""

db <>提琴here