我的日志文件记录如下所示:
107.344.154.200 - - [23 / Aug / 2005:00:03:14 -0400]" GET /images/theimage.gif HTTP / 1.0" 200 11401
我有这个语法来解析日志文件
CREATE TABLE日志文件(
主持人STRING,
身份STRING,
用户STRING,
时间STRING,
请求STRING,
状态STRING,大小 STRING) 行格式SERDE' org.apache.hadoop.hive.serde2.RegexSerDe' with SERDEPROPERTIES(" input.regex" ="([^] )([^] )([^] ) ( - | \ [[^ \]] \])([^ \"] | \" [^ \"] \&# 34;)( - | [0-9] )( - | [0-9] )",
" output.format.string" ="%1 $ s%2 $ s%3 $ s%4 $ s%5 $ s%6 $ s%7 $ s" )存储 作为文本文件;
我可以使用什么正则表达式语法来解析它将按日分钟秒分割的时间[23 / Aug / 2005:00:03:14 -0400]?
答案 0 :(得分:1)
此正则表达式将执行以下操作:
正则表达式
\[(\d{2})/([a-zA-Z]{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(-\d{4})]
注意,根据您可能必须通过/
替换它们来逃避\/
的语言。但语言不同。
NODE EXPLANATION
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[a-zA-Z]{3} any character of: 'a' to 'z', 'A' to 'Z'
(3 times)
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \6:
----------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
----------------------------------------------------------------------
) end of \6
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
( group and capture to \7:
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
----------------------------------------------------------------------
) end of \7
----------------------------------------------------------------------
] ']'
----------------------------------------------------------------------
示例文字
107.344.154.200 - - [23/Aug/2005:00:03:14 -0400] "GET /images/theimage.gif HTTP/1.0" 200 11401
现场演示
https://regex101.com/r/hF4fP8/1
样本匹配
[0][0] = [23/Aug/2005:00:03:14 -0400]
[0][1] = 23
[0][2] = Aug
[0][3] = 2005
[0][4] = 00
[0][5] = 03
[0][6] = 14
[0][7] = -0400