如何根据日志文件数据计算基于时间的指标(每小时平均值)?
让我更清楚一点,考虑一个包含以下条目的日志文件:每个UID在日志中只出现两次。它们将采用嵌入式xml格式。它们可能会出现在序列之外。并且日志文件将只有一天的数据,因此只有一天的记录。
UID的数量是日志文件中的2百万。
我必须找出这些请求的平均每小时响应时间。以下是日志文件中的请求和响应。 UID是关联黑白请求和响应的关键。
2013-04-03 08:54:19,451 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.448-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;FedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Beginning of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>
&lt;ratedDocument&gt;
&lt;objectType&gt;OLB_BBrecords&lt;/objectType&gt;
&lt;provider&gt;JET&lt;/provider&gt;
&lt;metadata&gt;&amp;lt;BooleanQuery&amp;gt;&amp;lt;Clause occurs=&amp;quot;must&amp;quot;&amp;gt;&amp;lt;TermQuery fieldName=&amp;quot;RegistrationNumber&amp;quot;&amp;gt;44565153050735751&amp;lt;/TermQuery&amp;gt;&amp;lt;/Clause&amp;gt;&amp;lt;/BooleanQuery&amp;gt;&lt;/metadata&gt;
&lt;/ratedDocument&gt;
</payload></log-message-body></body></log-event>
2013-04-03 08:54:19,989 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.987-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;fedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Successful Completion of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>0</payload></log-message-body></body></log-event>
这是我写的bash脚本。
uids=cat $i|grep "Service" |awk 'BEGIN {FS="lt;";RS ="gt;"} {print $2;}'| sort -u
for uid in ${uids}; do
count=`grep "$uid" test.log|wc -l`
if [ "${count}" -ne "0" ]; then
unique_uids[counter]="$uid"
let counter=counter+1
fi
done
echo ${unique_uids[@]}
echo $counter
echo " Unique No:" ${#unique_uids[@]}
echo uid StartTime EndTime" > $log
for unique_uids in ${unique_uids[@]} ; do
responseTime=`cat $i|grep "${unique_uids}" |awk '{split($2,Arr,":|,"); print Arr[1]*3600000+Arr[2]*60000+Arr[3]*1000+Arr[4]}'|sort -n`
echo $unique_uids $responseTime >> $log
done
输出应该是这样的 操作来自id,Consumer来自documentmetadata,小时是08:54:XX 因此,如果我们有多个请求和响应,则需要在该时刻平均请求的响应时间。
操作消费者小时平均响应时间(毫秒)
DKP_DumpDocumentProperties MRP 08 538
答案 0 :(得分:6)
鉴于您发布的输入文件:
$ cat file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
这个GNU awk脚本(你使用GNU awk,因为你在你的问题中发布的脚本中将RS设置为多字符串)
$ cat tst.awk
{
date = $1
time = $2
guid = gensub(/.*;gt;([^&]+).*/,"\\1","")
print guid, date, time
}
将取出我认为您关注的信息:
$ gawk -f tst.awk file
904c-be-4e-bbda-3e62 2013-04-03 08:54:19,989
904c-be-4e-bbda-3e62 2013-04-03 08:54:39,389
edfc-fr-5e-bced-3443 2013-04-03 08:54:34,979
edfc-fr-5e-bced-3443 2013-04-03 08:55:19,569
剩下的就是简单的数学,对吧?并在这个awk脚本中执行 - 不要将awk输出传递给一些愚蠢的shell循环!
答案 1 :(得分:3)
扩展Ed Morton's解决方案:
function parse_time (date, time, newtime) {
gsub(/-/, " ", date)
gsub(/:/, " ", time)
gsub(/,.*/, "", time)
newtime = date" "time
return newtime
}
(gensub(/.*;gt;([^&]+).*/,"\\1","") in starttime) {
etime = parse_time($1, $2)
endtime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = etime
next
}
{
stime = parse_time($1, $2)
starttime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = stime
}
END {
for (x in starttime) {
for (y in endtime) {
if (x==y) {
diff = mktime(endtime[x]) - mktime(starttime[y])
diff = sprintf("%dh:%dm:%ds",diff/(60*60),diff%(60*60)/60,diff%60)
print x, diff
delete starttime[x]
delete endtime[y]
}
}
}
}
$ cat log.file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
$ awk -f script.awk log.file
904c-be-4e-bbda-3e62 0h:0m:20s
edfc-fr-5e-bced-3443 0h:0m:45s