regex for string with backslash for escape

时间:2017-08-05 10:55:06

标签: java regex

I'm trying to come up with a pattern for finding every text that is between double or single quotation marks in java source code. This is what I have:

"(.*?)"|’(.*?)’

Debuggex Demo

This works for almost every case I guess except one:

"text\"moretext\"evenmore"

Debuggex Demo

This could be used as a valid String definition, because the quotes are escaped. The pattern does not recognize the inner part more text.

Any ideas for a pattern that accounts for this case?

2 个答案:

答案 0 :(得分:5)

You can use this regex to match single or double quotes string ignoring all escaped quotes:

(["'])([^\\]*?(?:\\.[^\\]*?)*)\1

RegEx Demo

RegEx Breakup:

  • (["']): Match single or double quote and capture it in group #1
  • (: Start Capturing group #2
    • [^\\]*?: Match 0 or more of any characters that is not a \
    • (?:`: Start non-capturing group
      • \\: Match a \
      • .: Followed by any character that is escaped
      • [^\\]*?: Followed by 0 or more of any non-\ characters
    • )*: End non-capturing group. Match 0 or more of this non-capturing group
  • ): End capturing group #2
  • \1: Match closing single or double quote matches in group #1

答案 1 :(得分:2)

That should work: "([^"\\]|\\.)*"|'([^'\\]|\\.)*' Regexr test.

Explanation:

  1. " matches ".
  2. [^"\\]|\\. negates match of \ & "(i.e. makes it to consume \") or continues match of \ and any character.
  3. * continue match.
  4. " matches "

Same for '.