如何从日志文件中提取唯一事件并对其进行计数?

时间:2013-03-23 20:04:21

标签: regex logging extract

我有一个shoutcast连接日志文件,想要找出使用的客户端和频率。日志文件非常庞大(大约100mb),包含过去3年的条目。日志条目看起来像这样(IP已被随机化!):

<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)

我想提取每个独特的客户端,并计算这种客户端的使用频率。对于上面的日志,结果应如下所示:

Internet%20Explorer%207   2
WMPlayer/10.0.0.364       1
WinampMPEG/5.50           2
TapinRadio                1

首先,我只是过滤了所有无用的条目。 (抱歉using cat。)

cat shoutcast.log | grep "starting stream" > filtered.txt

结果如下:

<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)

但现在呢?我有点迷失,如何访问{A: }括号中的信息?

1 个答案:

答案 0 :(得分:2)

尝试这个awk行:

 awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' input

使用您的数据进行测试:

kent$  cat ff
<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)

kent$  awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' ff
WMPlayer/10.0.0.364 : 1
TapinRadio : 1
WinampMPEG/5.50 : 2
Internet%20Explorer%207 : 2