我有一个shoutcast连接日志文件,想要找出使用的客户端和频率。日志文件非常庞大(大约100mb),包含过去3年的条目。日志条目看起来像这样(IP已被随机化!):
<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)
我想提取每个独特的客户端,并计算这种客户端的使用频率。对于上面的日志,结果应如下所示:
Internet%20Explorer%207 2
WMPlayer/10.0.0.364 1
WinampMPEG/5.50 2
TapinRadio 1
首先,我只是过滤了所有无用的条目。 (抱歉using cat。)
cat shoutcast.log | grep "starting stream" > filtered.txt
结果如下:
<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
但现在呢?我有点迷失,如何访问{A: }
括号中的信息?
答案 0 :(得分:2)
尝试这个awk行:
awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' input
使用您的数据进行测试:
kent$ cat ff
<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)
kent$ awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' ff
WMPlayer/10.0.0.364 : 1
TapinRadio : 1
WinampMPEG/5.50 : 2
Internet%20Explorer%207 : 2