使用Python中的正则表达式从字符串中的源代码中提取字符串常量

时间:2013-04-14 16:44:48

标签: python regex lexical-analysis

如何从字符串中的源代码中获取字符串常量?

例如,以下是我要处理的源代码:

var v = "this is string constant + some numbers and \" is also included "

我无法将所有内容都放在引号内。使用此正则表达式:"(.*?)"

我无法获得varv=或除字符串字符之外的任何其他内容。

3 个答案:

答案 0 :(得分:1)

您需要匹配一个开头报价,然后匹配任何转义字符或普通字符(引号和反斜杠除外),然后是结束报价:

"(?:\\.|[^"\\])*"

答案 1 :(得分:0)

为了获得引号内的所有内容,您可以尝试这样做: "\".+?\""re.findall()

。{

答案 2 :(得分:0)

使用lookbehind,确保“之前没有\

import re

data = 'var v = "this is string constant + some numbers and \" is also included "\r\nvar v = "and another \"line\" "'
matches = re.findall( r'= "(.*(?<!\\))"', data, re.I | re.M)
print(matches)

输出:

['this is string constant + some numbers and " is also included ', 'and another "line" ']