我需要从多行中提取某些信息(每个事务5行)并将输出作为csv文件。这些行来自maillog,其中每个事务都有自己的事务ID。这是一个样本交易:
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender@domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107@server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107@server01>, from=<sender@domain>, size=2488, to=<recipient@domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient@domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
我尝试的是,我将这5行分成1行并使用awk来解析每列 - 不幸的是,列数不均匀。
我正在考虑获取日期/时间(第1行,第1-3列),发件人,收件人和主题(第3行,&#34之后的单词; CLEAN - &#34;到最后线)
最好是bash中的sed或awk。 谢谢!
答案 0 :(得分:0)
说明:file
是您的文件。
该脚本将id
和block
初始化为空字符串。首次运行id
取字段nr的值。 7.之后,所有行都添加到block
,直到行与id
不匹配。此时block
和id
会重新初始化。
awk 'BEGIN{id="";block=""} {if (id=="") id=$6; else {if ($0~id) block= block $0; else {print block;block=$0;id=$6}}}' file
然后你将不得不处理输出的每一行。
答案 1 :(得分:0)
有很多方法可以解决这个问题。下面是一个调用简单脚本并将日志文件名作为第一个参数传递的示例。它将解析所请求的数据并将分离的数据保存到单个变量中。它只是在最后打印结果。
#!/bin/bash
[ -r "$1" ] || { ## validate input file readable
printf "error: invalid argument, file not readable '%s'\n" "$1"
exit 1
}
while read -r line; do
## set date from line containing from/sender
if grep -q -o 'from=<' <<<"$line" &>/dev/null; then
dt=$(cut -c -15 <<<"$line")
from=$(grep -o 'from=<[a-zA-Z0-9]*@[a-zA-Z0-9]*>' <<<"$line")
sender=${from##*<}
sender=${sender%>*}
fi
## search each line for CLEAN
if grep -q -o 'CLEAN.*$' <<<"$line" &>/dev/null; then
subject=$(grep -o 'CLEAN.*$' <<<"$line")
subject="${subject#*CLEAN - }"
fi
## search line for to
if grep -q -o 'to=<' <<<"$line" &>/dev/null; then
to=$(grep -o 'to=<[a-zA-Z0-9]*@[a-zA-Z0-9]*>' <<<"$line")
to=${to##*<}
to=${to%>*}
fi
done < "$1"
printf " date : %s\n from : %s\n to : %s\n subject: \"%s\"\n" \
"$dt" "$sender" "$to" "$subject"
<强>输入强>
$ cat dat/mail.log
Nov 17 00:15:19 server01 sm-mta[14107]: tAGGFJla014107: from=<sender@domain>, size=2447, class=0, nrcpts=1, msgid=<201511161615.tAGGFJla014107@server01>, proto=ESMTP, daemon=MTA, tls_verify=NONE, auth=NONE, relay=[100.24.134.19]
Nov 17 00:15:19 server01 flow-control[6033]: tAGGFJla014107 accepted
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - virus.McAfee: CLEAN - Declaration for Shared Parental Leave Allocation System
Nov 17 00:15:19 server01 MM: [Jilter Processor 21 - Async Jilter Worker 9 - 127.0.0.1:51698-tAGGFJla014107] INFO user.log - mtaqid=tAGGFJla014107, msgid=<201511161615.tAGGFJla014107@server01>, from=<sender@domain>, size=2488, to=<recipient@domain>, relay=[100.24.134.19], disposition=Deliver
Nov 17 00:15:20 server01 sm-mta[14240]: tAGGFJla014107: to=<recipient@domain>, delay=00:00:01, xdelay=00:00:01, mailer=smtp, pri=122447, relay=relayserver.domain. [100.91.20.1], dsn=2.0.0, stat=Sent (tAGGFJlR021747 Message accepted for delivery)
<强>输出强>
$ bash parsemail.sh dat/mail.log
date : Nov 17 00:15:19
from : sender@domain
to : recipient@domain
subject: "Declaration for Shared Parental Leave Allocation System"
注意:如果 from / sender 并非总是在第一行,您可以简单地从test子句下移出这些行。如果您有任何问题,请告诉我。