正则表达式在引号之间选择换行

时间:2019-03-25 11:52:05

标签: regex ruby

我在Ruby中有一个类似于以下内容的字符串:

{
  "a boolean": true,
  "multiline": "
my
multiline
value
",
  "a normal key": "a normal value"
}

我只想匹配子字符串中的换行符:

"
my
multiline
value
",

因此,我可以将它们替换为转义的换行符。从长远来看,这样做的目的是使JSON易于使用。

5 个答案:

答案 0 :(得分:2)

更新 -这些正则表达式可以正常工作。
来自@faissaloo-it seemed to fail however on my large JSON
我使用两个正则表达式都运行了这个大字符串:
 PCRE https://regex101.com/r/3jtqea/1
 Ruby https://regex101.com/r/1HVCCC/1
它们都工作相同,没有缺陷。
如果您还有其他疑问,请告诉我。


我认为Ruby支持类似Perl的构造。
如果是这样,可以在单个全局查找和替换中完成。
像这样:

编辑 -Ruby不会回溯控制动词(*SKIP)(*FAIL)
因此,要在Ruby代码中执行此操作,需要正则表达式更加明确。
因此,对pcre / perl regex进行一些修改后,Ruby等效项为:

Ruby
查找

(?-m)((?!\A)\G|(?:(?>[^"]*"[^"\r\n]*"[^"]*))*")([^"\r\n]*)\K\r?\n(?=[^"]*")((?:[^"\r\n]*"(?:(?>[^"]*"[^"\r\n]*"))*[^"]*)?)

替换

\\n\3

https://regex101.com/r/BaqjEE/1
https://rextester.com/NVFD38349

解释(但很复杂)

 (?-m)                                    # Non-multiline mode safety check
 (                                        # (1 start), Prefix. Capture for debug
      (?! \A )                                 # Not BOS
      \G                                       # Test where last match left off

   |                                         # or, 
      (?:                                      # Optionally align to next " ( only used once )
           (?> [^"]* " [^"\r\n]* " [^"]* )
      )*

      "                                        # A new quote to test
 )                                        # (1 end)

 ( [^"\r\n]* )                            # (2), Line break Preamble. Capture for debug
 \K                                       # Exclude from the match (group 0) up to this point

 \r? \n                                   # Line break to escape

 (?= [^"]* " )                            # Validate we have " closure

 (                                        # (3 start), Optional end quote and alignment.
                                               # To be written back.
      (?:
           [^"\r\n]* "                   
           (?:                                      # Optionally align to next "
                (?> [^"]* " [^"\r\n]* " )
           )*
           [^"]* 
      )?
 )                                        # (3 end)


 # Ruby Code:
 #----------------------
 # #ruby 2.3.1 
 # 
 # re = /(?-m)((?!\A)\G|(?:(?>[^"]*"[^"\r\n]*"[^"]*))*")([^"\r\n]*)\K\r?\n(?=[^"]*")((?:[^"\r\n]*"(?:(?>[^"]*"[^"\r\n]*"))*[^"]*)?)/
 # str = '{
 #   "a boolean": true,
 #   "a boolean": true,
 #   "a boolean": true,
 #   "a boolean": true,
 #   "multiline": "
 # my
 # multiline
 # value
 # asdf"
 # ,
 # 
 # "a multiline boo
 # lean": true,
 # "a normal key": "a multiline
 # 
 # value"
 # }'
 # subst = '\\n\3'
 # 
 # result = str.gsub(re, subst)
 # 
 # # Print the result of the substitution
 # puts result

对于Pcre / Perl
查找

(?:((?:(?>[^"]*"[^"\n]*"[^"]*))+(*SKIP)(*FAIL)|"|(?!^)\G)([^"\n]*)\K\n(?=[^"]*")((?:[^"\n]*")?))

替换

\\n$3

https://regex101.com/r/06naae/1

解释(但很复杂)
请注意,如果您在编辑器需要CRLF中断的窗口框中,
像这样\r在LF前面添加一个\r\n

 (?:
      (                             # (1 start), Prefix capture, for debug
           (?:
                (?> [^"]* " [^"\n]* " [^"]* )
           )+
           (*SKIP) (*FAIL)               # Consume false positives, but ignore them
                                         # (need this to align next ")
        |                              # or,
           "                             # A new quote to test
        |                              # or, 
           (?! ^ )                       # Not BOS
           \G                            # Test where last match left off
      )                             # (1 end)

      ( [^"\n]* )                   # (2), Preamble capture, for debug
      \K                            # Exclude from the match (group 0) up to this point
      \n                            # Line break to escape
      (?= [^"]* " )                 # Validate we have " closure
      (                             # (3 start), End quote, to be written back
           (?: [^"\n]* " )?
      )                             # (3 end)
 )

答案 1 :(得分:1)

我认为这可以为您提供帮助。您在字符串中捕获了\n,然后可以替换它:

"[^"]*(\n)*",

Test it

答案 2 :(得分:0)

另一个选择是这样的:

string = '{
  "a boolean": true,
  "multiline": "my
multiline
value",
  "a normal value"
}'

puts string.match(/"(\w+)(\n+\w*)+"/).to_s.gsub!("\n", '\n')

这与您字符串中的正则表达式匹配,然后用转义的换行符替换换行符。

答案 3 :(得分:0)

最新答案,但是您可以使用如下正则表达式:

'"(?=\n).*?"'

比赛:

"
    my
        multiline
    value
        ",

演示:

  1. Regex Demo & Explanation

答案 4 :(得分:0)

如果您的多行字符串不包含逗号(在换行符之前),则可以在json中使用,每行必须以,{[结尾否则下一行必须以}]开头:

json_string.gsub(/(?<!,|\{|\[)\n(?!\s*[}\]])/, '\n')

如果字符串(或大括号和方括号)中包含逗号,则可以通过向有效行尾列表中添加更多详细信息来改进此方法:

valid_line_ends = %w(true, false, ", }, ], { [)
line_end_matcher = valid_line_ends.map(&Regexp.method(:escape)).join('|')
json_string.gsub(/(?<!#{line_end_matcher})\n(?!\s*[}\]])/, '\n')