REGEXP插入特殊字符,而不是删除

时间:2017-04-14 14:53:36

标签: sql oracle substr regexp-substr

我如何在缺少它的两个字段周围加上双引号?我能在一个语句中使用INSTR / SUBSTR / REPLACE来完成它吗?

string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';

Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';

请建议!谢谢。

3 个答案:

答案 0 :(得分:1)

此答案在这种情况下不起作用,因为某些字段包含逗号。我要离开它,以防它帮助其他任何人。

内部字段的一种相当强力的方法是:

replace(replace(string, ',', '","'), '""', '"')

这会在逗号的两边添加双引号,然后删除双引号。您无需担心""。它变为"""",然后返回""

这也适用于第一个和最后一个字段,但它使表达式复杂化。

答案 1 :(得分:0)

此方法对字符串进行2次传递。首先查找双引号后跟逗号的分组,然后是不是双引号的字符。用他们的组的简写,第一组'\1',缺少的双引号,第二组'\2'来替换它们。然后再做一次,但反过来说。当然你可以嵌套regex_replace调用并最终得到一个大丑陋的声明,但只需使它成为2个语句以便于维护。在你感谢之后,这个人正在为此工作,而且这很丑陋。

SQL> with orig(str) as (
     select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
     from dual
   ),
   rpl_first(str) as (
     select regexp_replace(str, '(",)([^"])', '\1"\2')
   from orig
   )
   select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
   from rpl_first;

FIXED_STRING
--------------------------------------------------------------------------------

"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""

,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681

44"


SQL>

编辑:更改了正则表达式,并添加了第三步,以便根据Unoembre的评论允许空的,未引用的字段。接得好!还添加了其他测试用例。始终期待意外情况,并确保为所有数据组合添加测试用例。

SQL> with orig(str) as (
        select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
        from dual union
        select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
        select '"ES26653","ABCBEVERAGES",861526999728' from dual union
        select '1S26653,"ABCBEVERAGES",861526999728' from dual union
        select '"ES26653",,861526999728' from dual
      ),
      rpl_empty(str) as (
        select regexp_replace(str, ',,', ',"",')
        from orig
      ),
      rpl_first(str) as (
        select regexp_replace(str, '(",|^)([^"])', '\1"\2')
      from rpl_empty
      )
      select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
      from rpl_first;

FIXED_STRING
--------------------------------------------------------------------------------

"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""

,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681

44"

"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"

SQL>

答案 2 :(得分:0)

此优惠试图解决一些最终案例:

  • 解决第一个和最后一个字段的问题。这里只有最后一个字段是特殊情况,因为我们会查找字符串结尾$而不是逗号。
  • 清空未加引号的字段,即引号,连续逗号和尾随逗号。
  • 在表示单个双引号的字段中保留一对双引号。

SQL:

WITH orig(str) AS (
     SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
     FROM dual
   ),
   rpl_first(str) AS (
     SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5') 
   FROM orig
   )
   SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
   FROM rpl_first;

该技术是找到引用字段并记住它或非引用字段并记住它,以逗号或字符串结尾终止并记住。然后答案为",后跟其中一个字段,后跟",然后是终结符。

引用的字段基本上是"[^"]*",其中[^"]是任何不是引号的字符,*重复零次或多次。由于非引号字符也可以是一对引号,因此我们需要一个OR构造(|)"([^"]|"")*",这很复杂。但是我们必须记住引号内的字段,所以添加括号,以便我们稍后可以引用它"(([^"]|"")*)"

未加引号的字段只是一个非逗号重复零次或多次,我们想要记住它([^,]*)

所以我们想要找到其中任何一个,OR构造再次,即("(([^"]|"")*)"|([^,]*))。接下来是终结符,可以是逗号或字符串结尾,我们要记住它,即(,|$)

现在我们可以用引号括起来的两种类型的字段之一替换它,后面跟终结符即"\2\4"\5。后引用n的数量\n只是计算开括号的问题。

第二个REGEXP_REPLACE是解决我怀疑是Oracle错误的问题。如果引用了最后一个字段,则会在字符串的末尾添加一对额外的引号。这表明在解析时字符串结尾处理两次,这将是一个错误。但是,正则表达式处理可能是由标准库例程完成的,因此它可能是我对正则表达式规则的解释。欢迎提出意见。

可以在Using Regular Expressions in Database Applications找到Oracle正则表达式文档。

感谢@Gary_W的模板。在这里,我保留两个独立的正则表达式块来分隔我可以解释的位,我不能解释(错误?)。