描述

Question

我的日志文件记录如下所示：

107.344.154.200 - - [23 / Aug / 2005：00：03：14 -0400]＆＃34; GET /images/theimage.gif HTTP / 1.0＆＃34; 200 11401

我有这个语法来解析日志文件

CREATE TABLE日志文件（
  主持人STRING，
    身份STRING，
    用户STRING，
    时间STRING，
  请求STRING，
  状态STRING，大小    STRING）   行格式SERDE＆＃39; org.apache.hadoop.hive.serde2.RegexSerDe＆＃39;   with SERDEPROPERTIES（＆＃34; input.regex＆＃34; =＆＃34;（[^] ）（[^] ）（[^] ）   （ - | \ [[^ \]] \]）（[^ \＆＃34;] | \＆＃34; [^ \＆＃34;] \＆＃ 34;）（ - | [0-9] ）（ - | [0-9] ）＆＃34;，
  ＆＃34; output.format.string＆＃34; =＆＃34;％1 $ s％2 $ s％3 $ s％4 $ s％5 $ s％6 $ s％7 $ s＆＃34; ）存储   作为文本文件;

我可以使用什么正则表达式语法来解析它将按日分钟秒分割的时间[23 / Aug / 2005：00：03：14 -0400]？

Answer 1

描述

此正则表达式将执行以下操作：

解析日志条目并查找日期和时间
捕获各种日期部分，如日，月，年，小时，分钟，秒，UTC偏移量

正则表达式

\[(\d{2})/([a-zA-Z]{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(-\d{4})]

注意，根据您可能必须通过/替换它们来逃避\/的语言。但语言不同。

解释

Regular expression visualization

NODE                     EXPLANATION
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [a-zA-Z]{3}              any character of: 'a' to 'z', 'A' to 'Z'
                             (3 times)
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    -                        '-'
----------------------------------------------------------------------
    \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
  )                        end of \7
----------------------------------------------------------------------
  ]                        ']'
----------------------------------------------------------------------

示例文字

107.344.154.200 - - [23/Aug/2005:00:03:14 -0400] "GET /images/theimage.gif HTTP/1.0" 200 11401

现场演示

https://regex101.com/r/hF4fP8/1

样本匹配

[0][0] = [23/Aug/2005:00:03:14 -0400]
[0][1] = 23
[0][2] = Aug
[0][3] = 2005
[0][4] = 00
[0][5] = 03
[0][6] = 14
[0][7] = -0400

如何在Hive中使用正则表达式来解析Apache日志时间戳？

1 个答案:

描述

解释