用于复杂字符串的Python正则表达式

时间:2015-01-20 08:04:26

标签: python regex

下面我有一个String,我想用分隔符空格分割,以便忽略double-qouted值中的空格。例如:

string = '3e656b8e06c176 el-s3-log-file [24/Dec/2014:11:54:18 +0000] 
202.141.245.38 arn:aws:iam::xxxxx:user/xyz E27FFBA2CA3D61F3 REST.GET.OBJECT 
logs/2014-12-23-09-25-19-E39257 "GET /el-s36 HTTP/1.1" 200 - 660 660 30 30 
"https://s3-console-us-standard/Console.html?region&locale=en" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" -'

values = string.split(' ')

上面也分了双重值。 Example: ['"GET', '/el-s36', 'HTTP/1.1"']

想要一个正则表达式,忽略双qoutes和[]内的空格。

2 个答案:

答案 0 :(得分:0)

使用\s代替空格来匹配换行符。也就是说,它会根据空格和换行符分割输入。

>>> re.split(r'\s+(?=(?:"[^"]*"|[^"])*$)', string)
['3e656b8e06c176', 'el-s3-log-file', '[24/Dec/2014:11:54:18', '+0000]', '202.141.245.38', 'arn:aws:iam::xxxxx:user/xyz', 'E27FFBA2CA3D61F3', 'REST.GET.OBJECT', 'logs/2014-12-23-09-25-19-E39257', '"GET /el-s36 HTTP/1.1"', '200', '-', '660', '660', '30', '30', '"https://s3-console-us-standard/Console.html?region&locale=en"', '"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"', '-']

<强>更新

这会根据双引号中不存在的空格或[]来分割输入字符串。

>>> re.split(r'\s+(?=(?:"[^"]*"|[^"])*$)(?![^\[\]]*\])', string)
['3e656b8e06c176', 'el-s3-log-file', '[24/Dec/2014:11:54:18 +0000]', '202.141.245.38', 'arn:aws:iam::xxxxx:user/xyz', 'E27FFBA2CA3D61F3', 'REST.GET.OBJECT', 'logs/2014-12-23-09-25-19-E39257', '"GET /el-s36 HTTP/1.1"', '200', '-', '660', '660', '30', '30', '"https://s3-console-us-standard/Console.html?region&locale=en"', '"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"', '-']

答案 1 :(得分:0)

re.findall('(".*?"|\[.*?\]|[^ ]+)',string)

(更新为适当数量的贪婪)