下面我有一个String,我想用分隔符空格分割,以便忽略double-qouted值中的空格。例如:
string = '3e656b8e06c176 el-s3-log-file [24/Dec/2014:11:54:18 +0000]
202.141.245.38 arn:aws:iam::xxxxx:user/xyz E27FFBA2CA3D61F3 REST.GET.OBJECT
logs/2014-12-23-09-25-19-E39257 "GET /el-s36 HTTP/1.1" 200 - 660 660 30 30
"https://s3-console-us-standard/Console.html?region&locale=en" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" -'
values = string.split(' ')
上面也分了双重值。 Example: ['"GET', '/el-s36', 'HTTP/1.1"']
想要一个正则表达式,忽略双qoutes和[]
内的空格。
答案 0 :(得分:0)
使用\s
代替空格来匹配换行符。也就是说,它会根据空格和换行符分割输入。
>>> re.split(r'\s+(?=(?:"[^"]*"|[^"])*$)', string)
['3e656b8e06c176', 'el-s3-log-file', '[24/Dec/2014:11:54:18', '+0000]', '202.141.245.38', 'arn:aws:iam::xxxxx:user/xyz', 'E27FFBA2CA3D61F3', 'REST.GET.OBJECT', 'logs/2014-12-23-09-25-19-E39257', '"GET /el-s36 HTTP/1.1"', '200', '-', '660', '660', '30', '30', '"https://s3-console-us-standard/Console.html?region&locale=en"', '"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"', '-']
<强>更新强>
这会根据双引号中不存在的空格或[]
来分割输入字符串。
>>> re.split(r'\s+(?=(?:"[^"]*"|[^"])*$)(?![^\[\]]*\])', string)
['3e656b8e06c176', 'el-s3-log-file', '[24/Dec/2014:11:54:18 +0000]', '202.141.245.38', 'arn:aws:iam::xxxxx:user/xyz', 'E27FFBA2CA3D61F3', 'REST.GET.OBJECT', 'logs/2014-12-23-09-25-19-E39257', '"GET /el-s36 HTTP/1.1"', '200', '-', '660', '660', '30', '30', '"https://s3-console-us-standard/Console.html?region&locale=en"', '"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"', '-']
答案 1 :(得分:0)
re.findall('(".*?"|\[.*?\]|[^ ]+)',string)
(更新为适当数量的贪婪)