Question

Let's say, for example, there is a Python source code file like:

def someStuff():
  return "blabla"

myThing = "Bob told me: \"Hello there!\""

twoStrings = "first part " + "second part"

How would I write a regular expression to match:

"blabla", "Bob told me: \"Hello there!\"", "first part ", & "second part"

including the surrounding quotes?

Originally, I figured this could be done simply with \"[^\"]*\" but this fails to take into account cases where the string contains a \". I've tried incorporating negative look-behinds also:

(?<!\\)\"[^\"]*(?<!\\)\"

but have not had any success. What would be the recommended way to handle this?

Answer 1

此正则表达式（使用单行修饰符s）应匹配所有类型的字符串文字：

([bruf]*)("""|'''|"(?!")|'(?!'))(?:(?!\2)(?:\\.|[^\\]))*\2

这支持三引号字符串，转义序列，它还会捕获r，u，f和b等前缀。请参阅online demo。

需要使用单行修饰符s来正确匹配多行字符串。此外，启用i修饰符可使其与R'nobody uses capitalized prefixes anyways'等大写前缀相匹配。

据我所知，有两点需要注意：

它还匹配字节文字。
它匹配评论中的字符串文字。

正则表达式的解释：

([bruf]*)                # match and capture any prefix characters
("""|'''|"(?!")|'(?!'))  # match the opening quote
(?:                      # as many times as possible...
    (?!\2)               # ...as long as there's no closing quote... 
    (?:                  # ...match either...
        \\.              # ...a backslash and the character after it
    |                    # ...or...
        [^\\]            # ...a single non-backslash character
    )        
)*
\2                       # match the closing quote

Answer 2

Use negative look behind:

".*?(?<!\\)"

This uses a lazy quantifier (*?) to match until the next quote (") as long the quote is not escaped by a backslash (\"). Compare with the simpler (but erroneous) regex ".*?"

Regex to Match String Syntax in Code

2 个答案: