我试图从每小时计算avg响应时间,该日志文件有数百万条记录,下面有日志摘录
截至目前,我正在尝试创建temproary文件,该文件将包含具有唯一ID和开始时间和结束时间的行,然后在此临时文件上运行另一个脚本以计算每小时的平均响应时间 我的脚本需要一个多小时才能创建临时文件。
我们有什么方法可以更快地完成它?或更好的脚本,其优化时间较短。 注意:这些UNIQID不是按顺序进行的。
log file format
2012-06-04 13:04:19,324 UNIQID1
2012-06-04 13:04:20,120 UNIQID1
2012-06-04 13:05:19,324 UNIQID2
2012-06-04 13:06:20,120 UNIQID2
2012-06-04 13:07:19,324 UNIQID3
2012-06-04 13:08:20,120 UNIQID3
2012-06-04 13:08:49,324 UNIQID4
2012-06-04 13:09:50,120 UNIQID4
这是我的代码:
uids=`cat $i|grep "UNIQ" |sort -u` >> $log
for uid in ${uids}; do
count=`grep "$uid" test.log|wc -l`
if [ "${count}" -ne "0" ]; then
unique_uids[counter]="$uid"
let counter=counter+1
fi
done
echo ${unique_uids[@]}
echo $counter
echo " Unique No:" ${#unique_uids[@]}
echo uid StartTime EndTime" > $log
for unique_uids in ${unique_uids[@]} ; do
responseTime=`cat $i|grep "${unique_uids}" |awk '{split($2,Arr,":|,"); print Arr[1]*3600000+Arr[2]*60000+Arr[3]*1000+Arr[4]}'|sort -n`
echo $unique_uids $responseTime >> $log
done
谢谢你的时间!
答案 0 :(得分:1)
一些简单的修复:
cat
来电;只需使用文件名作为grep
的最后一个参数。while IFS= read -r date time id
循环可能会更快。答案 1 :(得分:0)
您的脚本有几个问题,我认为您会发现下面的内容更适合您的需求。首先,您不需要生成所有这些进程来完成工作 - 在awk中完成所有这些工作相当简单。此外,您发布的代码假定特定的UNIQID仅在同一日期发生。如果你的记录在午夜到第二天结束,这个假设会引起很大的痛苦。
以下代码在awk
脚本中执行您想要的操作。它假设您正在使用gawk
(Gnu awk)。如果不是,您可以在网络上找到awk
mktime的实施including here
BEGIN {
while (getline < UIDFILE) {
x[$0] = 1; # Awk will maintain these as an associative array, lookups are hashed
}
}
{
r = $NF; # Extract the unique ID from the record into r
if (r in x) { # If the UID is something we are interested in, then ...
ts = $1 " " $2; # concatenate these fields
gsub ("[:-]", " ", ts); # Replace the : and - with spaces
gsub (",.*", "", ts); # Remove everything after the comma
# print ts, mktime(ts) # If you want to see what mktime does
if (x[r] == "") # First time seeing this unique ID?
x[r] = mktime(ts); # Store the timestamp
else { # We're seeing it the second time
now = mktime(ts) # Keep track of the current log time
rt = now - x[r]; # Compute the delta
delete (x[r]) # We don't need it any more
# printf "Record <%s> has response time %f\n", r, rt; # Print it out if you'd like
hourrt += rt; # Add it to this hour's total response time
num++; # And also keep track of how many records we have ending in this hour
if (now % 3600 == 0) { # Have we switched to a new hour?
printf "Average response time = %f\n", hourrt / num; # Dump the average
num = hourrt = 0;
}
}
}
}
您需要按如下方式调用此脚本:
gawk -v UIDFILE=name_of_uid_file -f scriptname.awk