Let's say, for example, there is a Python source code file like:
def someStuff():
return "blabla"
myThing = "Bob told me: \"Hello there!\""
twoStrings = "first part " + "second part"
How would I write a regular expression to match:
"blabla"
, "Bob told me: \"Hello there!\""
, "first part "
, & "second part"
including the surrounding quotes?
Originally, I figured this could be done simply with \"[^\"]*\"
but this fails to take into account cases where the string contains a \"
. I've tried incorporating negative look-behinds also:
(?<!\\)\"[^\"]*(?<!\\)\"
but have not had any success. What would be the recommended way to handle this?
答案 0 :(得分:1)
此正则表达式(使用单行修饰符s
)应匹配所有类型的字符串文字:
([bruf]*)("""|'''|"(?!")|'(?!'))(?:(?!\2)(?:\\.|[^\\]))*\2
这支持三引号字符串,转义序列,它还会捕获r
,u
,f
和b
等前缀。请参阅online demo。
需要使用单行修饰符s
来正确匹配多行字符串。此外,启用i
修饰符可使其与R'nobody uses capitalized prefixes anyways'
等大写前缀相匹配。
据我所知,有两点需要注意:
正则表达式的解释:
([bruf]*) # match and capture any prefix characters
("""|'''|"(?!")|'(?!')) # match the opening quote
(?: # as many times as possible...
(?!\2) # ...as long as there's no closing quote...
(?: # ...match either...
\\. # ...a backslash and the character after it
| # ...or...
[^\\] # ...a single non-backslash character
)
)*
\2 # match the closing quote
答案 1 :(得分:0)
Use negative look behind:
This uses a lazy quantifier (*?
) to match until the next quote ("
) as long the quote is not escaped by a backslash (\"
). Compare with the simpler (but erroneous) regex ".*?"