从多行收集信息

时间:2015-11-25 07:44:06

标签: bash awk sed

我需要从多行中提取某些信息(每个事务5行)并将输出作为csv文件。这些行来自maillog,其中每个事务都有自己的事务ID。这是一个样本交易:

Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender@domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107@server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO  user.log  - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO  user.log  - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107@server01>, from=<sender@domain>, size=2488, to=<recipient@domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient@domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)

我尝试的是,我将这5行分成1行并使用awk来解析每列 - 不幸的是,列数不均匀。

我正在考虑获取日期/时间(第1行,第1-3列),发件人,收件人和主题(第3行,&#34之后的单词; CLEAN - &#34;到最后线)

最好是bash中的sed或awk。 谢谢!

2 个答案:

答案 0 :(得分:0)

说明:file是您的文件。 该脚本将idblock初始化为空字符串。首次运行id取字段nr的值。 7.之后,所有行都添加到block,直到行与id不匹配。此时blockid会重新初始化。

 awk 'BEGIN{id="";block=""} {if (id=="") id=$6; else {if ($0~id) block= block $0; else {print block;block=$0;id=$6}}}' file

然后你将不得不处理输出的每一行。

答案 1 :(得分:0)

有很多方法可以解决这个问题。下面是一个调用简单脚本并将日志文件名作为第一个参数传递的示例。它将解析所请求的数据并将分离的数据保存到单个变量中。它只是在最后打印结果。

#!/bin/bash

[ -r "$1" ] || {  ## validate input file readable
    printf "error: invalid argument, file not readable '%s'\n" "$1"
    exit 1
}

while read -r line; do

    ## set date from line containing from/sender
    if grep -q -o 'from=<' <<<"$line" &>/dev/null; then
        dt=$(cut -c -15 <<<"$line")
        from=$(grep -o 'from=<[a-zA-Z0-9]*@[a-zA-Z0-9]*>' <<<"$line")
        sender=${from##*<}
        sender=${sender%>*}
    fi
    ## search each line for CLEAN
    if grep -q -o 'CLEAN.*$' <<<"$line" &>/dev/null; then
        subject=$(grep -o 'CLEAN.*$' <<<"$line")
        subject="${subject#*CLEAN - }"
    fi
    ## search line for to
    if grep -q -o 'to=<' <<<"$line" &>/dev/null; then
        to=$(grep -o 'to=<[a-zA-Z0-9]*@[a-zA-Z0-9]*>' <<<"$line")
        to=${to##*<}
        to=${to%>*}
    fi

done < "$1"

printf " date   : %s\n from   : %s\n to     : %s\n subject: \"%s\"\n" \
"$dt" "$sender" "$to" "$subject"

<强>输入

$ cat dat/mail.log
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender@domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107@server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO  user.log  - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO  user.log  - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107@server01>, from=<sender@domain>, size=2488, to=<recipient@domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient@domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)

<强>输出

$ bash parsemail.sh dat/mail.log
 date   : Nov 17 00:15:19
 from   : sender@domain
 to     : recipient@domain
 subject: "Declaration for Shared Parental Leave Allocation System"

注意:如果 from / sender 并非总是在第一行,您可以简单地从test子句下移出这些行。如果您有任何问题,请告诉我。