Question

我有一个类似这样的日志文件：

Client connected with ID 8127641241
< multiple lines of unimportant log here>
Client not responding
Total duration: 154.23583
Sent: 14
Received: 9732
Client lost

Client connected with ID 2521598735
< multiple lines of unimportant log here>
Client not responding
Total duration: 12.33792
Sent: 2874
Received: 1244
Client lost

该日志包含许多以Client connected with ID 1234开头且以Client lost结尾的块。他们永远不会混淆（一次只有一个客户）。

我将如何解析此文件并生成如下统计信息：

enter image description here

我主要是询问解析过程，而不是格式化。

我想我可以遍历所有行，在找到Client connected行时设置一个标志并将ID保存在变量中。然后grep这些行，保存值，直到找到Client lost行。这是一个好方法吗？还有更好的吗？

Answer 1

这是使用awk：

的快捷方式

awk 'BEGIN { print "ID Duration Sent Received" } /^(Client connected|Total duration:|Sent:)/ { printf "%s ", $NF } /^Received:/ { print $NF }' file | column -t

结果：

ID          Duration   Sent  Received
8127641241  154.23583  14    9732
2521598735  12.33792   2874  1244

Answer 2

如果您确定日志文件没有错误，并且字段的顺序始终相同，则可以使用以下内容：

#!/bin/bash

ids=()
declare -a duration
declare -a sent
declare -a received
while read _ _ _ _ id; do
   ids+=( "$id" )
   read _ _ duration[$id]
   read _ sent[$id]
   read _ received[$id]
done < <(grep '\(^Client connected with ID\|^Total duration:\|^Sent:\|Received:\)' logfile)

# printing the data out, for control purposes only
for id in "${ids[@]}"; do
   printf "ID=%s\n\tDuration=%s\n\tSent=%s\n\tReceived=%s\n" "$id" "${duration[$id]}" "${sent[$id]}" "${received[$id]}"
done

输出是：

$ ./parsefile
ID=8127641241
    Duration=154.23583
    Sent=14
    Received=9732
ID=2521598735
    Duration=12.33792
    Sent=2874
    Received=1244

但数据存储在相应的关联数组中。它效率很高。它可能在另一种编程语言（例如perl）中稍微有效，但由于您只使用bash，sed和grep标记了帖子，我想我完全回答了您的问题

说明：grep只过滤我们感兴趣的行，而bash只读取我们感兴趣的字段，假设它们总是以相同的顺序排列。该脚本应易于理解和修改，以满足您的需求。

Answer 3

AWK：

awk 'BEGIN{print "ID Duration Sent Received"}/with ID/&&!f{f=1}f&&/Client lost/{print a[1],a[2],a[3],a[4];f=0}f{for(i=1;i<=NF;i++){
        if($i=="ID")a[1]=$(i+1)
        if($i=="duration:")a[2]=$(i+1)
        if($i=="Sent:")a[3]=$(i+1)
        if($i=="Received:")a[4]=$(i+1)
}}'log

如果数据块之间总是有空行，则上面的awk脚本可以简化为：

 awk -vRS="" 'BEGIN{print "ID Duration Sent Received"}
{for(i=1;i<=NF;i++){
        if($i=="ID")a[1]=$(i+1)
        if($i=="duration:")a[2]=$(i+1)
        if($i=="Sent:")a[3]=$(i+1)
        if($i=="Received:")a[4]=$(i+1)
}print a[1],a[2],a[3],a[4];}' log

输出：

ID Duration Sent Received
8127641241 154.23583 14 9732
2521598735 12.33792 2874 1244

如果您希望获得更好的格式，请将输出通过管道传输到|column -t

你得到：

ID          Duration   Sent  Received
8127641241  154.23583  14    9732
2521598735  12.33792   2874  1244

Answer 4

perl

中的解决方案

#!/usr/bin/perl

use warnings;
use strict;

print "\tID\tDuration\tSent\tReceived\n";

while (<>) {
  chomp;
  if (/Client connected with ID (\d+)/) {
    print "$1\t";
  }
  if (/Total duration: ([\d\.]+)/) {
    print "$1\t";
  }
  if (/Sent: (\d+)/) {
    print "$1\t";
  }
  if (/Received: (\d+)/) {
    print "$1\n";
  }
}

示例输出：

        ID  Duration    Sent    Received
8127641241  154.23583   14  9732
2521598735  12.33792    2874    1244

Answer 5

使用段落模式来浏览文件

使用Perl或AWK，您可以使用特殊段落模式在记录中啜饮，该模式使用记录之间的空行作为分隔符。在Perl中，使用-00来使用段落模式;在AWK中，您将 RS 变量设置为空字符串（例如""）以执行相同的操作。然后你可以解析每条记录中的字段。

使用面向行的语句

或者，您可以使用shell while循环一次读取每一行，然后使用grep或sed来解析每一行。您甚至可以使用case语句，具体取决于解析的复杂程度。

例如，假设您在记录中始终有5个匹配的字段，您可以执行以下操作：

while read; do
    grep -Eo '[[:digit:]]+'
done < /tmp/foo | xargs -n5 | sed 's/ /\t/g'

循环将产生：

23583   14  9732    2521598735  33792
2874    1244    8127641241  23583   14
9732    2521598735  33792   2874    1244

您当然可以使用格式，添加标题行等等。关键是你必须知道你的数据。

AWK，Perl甚至Ruby都是解析面向记录格式的更好选择，但如果您的需求是基本的，那么shell肯定是一种选择。

Answer 6

Perl的简短片段：

perl -ne '
    BEGIN {print "ID Duration Sent Received\n";}
    print "$1 " if /(?:ID|duration:|Sent:|Received:) (.+)$/;
    print "\n" if /^Client lost/;
' filename | column -t

Answer 7

awk -v RS= -F'\n' '
BEGIN{ printf "%15s%15s%15s%15s\n","ID","Duration","Sent","Received" }
{
   for (i=1;i<=NF;i++) {
      n = split($i,f,/ /)    
      if ( $i ~ /^(Client connected|Total duration:|Sent:|Received:)/ ) {
         printf "%15s",f[n]
      }
   }
   print ""
}'

解析日志文件的最佳方法

7 个答案:

使用段落模式来浏览文件

使用面向行的语句