我有一个shell脚本,每小时通过cron作业调用一次,并搜索星号日志并为我提供一个以原因31结束的呼叫的唯一ID。
while read ref
do
cat sample.log | grep "$ref" | grep 'got hangup request, cause 31' | grep -o 'C-[0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z]' >> cause_temp.log
done < callref.log
问题是while循环太慢而且为了准确性我已经包含了4个while循环来执行各种检查。
callref.log文件由呼叫标识符值组成,每小时它将有大约50-90万个值,脚本大约需要45-50分钟才能完成执行并向我发送报告。
如果我能够减少循环的执行时间,那将是非常有帮助的。由于sample.log文件的大小约为20 GB,并且每个循环都打开文件并执行搜索,我认为while循环是这里的瓶颈。
但是解决方案建议我无法实现或不知道如何实现。任何建议都会有所帮助。感谢
由于sample.log包含敏感信息,因此我无法共享任何日志,但下面是我从互联网上获取的一些示例日志。
Dec 16 18:02:04 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"503"<sip:503@192.168.1.107>' failed for '192.168.1.137' - Wrong password
Dec 16 18:03:13 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"502"<sip:502@192.168.1.107>' failed for '192.168.1.137' - Wrong password
Dec 16 18:04:49 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"1737245082"<sip:1737245082@192.168.1.107>' failed for '192.168.1.137' - Username/auth name mismatch
Dec 16 18:04:49 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"100"<sip:100@192.168.1.107>' failed for '192.168.1.137' - Username/auth name mismatch
Jun 27 18:09:47 host asterisk[31774]: ERROR[27910]: chan_zap.c:10314 setup_zap: Unable to register channel '1-2'
Jun 27 18:09:47 host asterisk[31774]: WARNING[27910]: loader.c:414 __load_resource: chan_zap.so: load_module failed, returning -1
Jun 27 18:09:47 host asterisk[31774]: WARNING[27910]: loader.c:554 load_modules: Loading module chan_zap.so failed!
文件callref.log包含一个看起来像 -
的行列表C-001ec22d
C-001ec23d
C-001ec24d
C-001ec31d
C-001ec80d
上述while循环的所需输出也类似于C-001ec80d
我的主要关注点是让while循环运行得更快。就像在数组中加载callref.log的所有值一样,如果可能的话,在sample.log的一次传递中同时搜索所有值。
答案 0 :(得分:0)
我花了一天时间构建测试框架并测试不同命令的变体,我认为你已经拥有最快的命令。
这让我觉得如果要获得更好的性能,你应该研究一个日志摘要框架,比如ossec(你的日志样本来自哪里)也许是splunk。那些可能对你的意愿太笨拙了。或者,您应该考虑在java / C / perl / awk中设计和构建更适合解析的东西。
更频繁地运行现有脚本也会有所帮助。
祝你好运!如果你愿意,我可以将我所做的工作收拾好并在此发布,但我认为它有点过分。按要求; CalFuncs.sh:我在大多数脚本中提供的库
#!/bin/bash
LOGDIR="/tmp"
LOG=$LOGDIR/CalFunc.log
[ ! -d "$LOGDIR" ] && mkdir -p $(dirname $LOG)
SSH_OPTIONS="-o StrictHostKeyChecking=no -q -o ConnectTimeout=15"
SSH="ssh $SSH_OPTIONS -T"
SCP="scp $SSH_OPTIONS"
SI=$(basename $0)
Log() {
echo "`date` [$SI] $@" >> $LOG
}
Run() {
Log "Running '$@' in '`pwd`'"
$@ 2>&1 | tee -a $LOG
}
RunHide() {
Log "Running '$@' in '`pwd`'"
$@ >> $LOG 2>&1
}
PrintAndLog() {
Log "$@"
echo "$@"
}
ErrorAndLog() {
Log "[ERROR] $@ "
echo "$@" >&2
}
showMilliseconds(){
date +%s
}
runMethodForDuration(){
local startT=$(showMilliseconds)
$1
local endT=$(showMilliseconds)
local totalT=$((endT-startT))
PrintAndLog "that took $totalT seconds to run $1"
echo $totalT
}
genCallRefLog.sh - 根据参数
生成虚构的callref.log大小#!/bin/bash
#Script to make 80000 sequential lines of callref.log this should suffice for a POC
if [ -z "$1" ] ; then
echo "genCallRefLog.sh requires an integer of the number of lines to pump out of callref.log"
exit 1
fi
file="callref.log"
[ -f "$file" ] && rm -f "$file" # del file if exists
i=0 #put start num in here
j="$1" #put end num in here
echo "building $j lines of callref.log"
for (( a=i ; a < j; a++ ))
do
printf 'C-%08x\n' "$a" >> $file
done
genSampleLog.sh根据参数
生成虚构的sample.log大小#!/bin/bash
#Script to make 80000 sequential lines of callref.log this should suffice for a POC
if [ -z "$1" ] ; then
echo "genSampleLog.sh requires an integer of the number of lines to pump out of sample.log"
exit 1
fi
file="sample.log"
[ -f "$file" ] && rm -f "$file" # del file if exists
i=0 #put start num in here
j="$1" #put end num in here
echo "building $j lines of sample.log"
for (( a=i ; a < j; a++ ))
do
printf 'Dec 16 18:02:04 asterisk1 asterisk[31774]: NOTICE[31787]: C-%08x got hangup request, cause 31\n' "$a" >> $file
done
最后是我使用的实际测试脚本。通常我会注释掉构建脚本,因为它们只需要在更改日志大小时运行。我通常也只会一次运行一个测试功能并记录结果。
test.sh
#!/bin/bash
source "./CalFuncs.sh"
targetLogFile="cause_temp.log"
Log "Starting"
checkTargetFileSize(){
expectedS="$1"
hasS=$(cat $targetLogFile | wc -l)
if [ "$expectedS" != "$hasS" ] ; then
ErrorAndLog "Got $hasS but expected $expectedS, when inspecting $targetLogFile"
exit 244
fi
}
standard(){
iter=0
while read ref
do
cat sample.log | grep "$ref" | grep 'got hangup request, cause 31' | grep -o 'C-[0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z]' >> $targetLogFile
done < callref.log
}
subStandardVarient(){
iter=0
while read ref
do
cat sample.log | grep 'got hangup request, cause 31' | grep -o "$ref" >> $targetLogFile
done < callref.log
}
newFunction(){
grep -f callref.log sample.log | grep 'got hangup request, cause 31' >> $targetLogFile
}
newFunction4(){
grep 'got hangup request, cause 31' sample.log | grep -of 'callref.log'>> $targetLogFile
}
newFunction5(){
#splitting grep
grep 'got hangup request, cause 31' sample.log > /tmp/somefile
grep -of 'callref.log' /tmp/somefile >> $targetLogFile
}
newFunction2(){
iter=0
while read ref
do
((iter++))
echo "$ref" | grep 'got hangup request, cause 31' | grep -of 'callref.log' >> $targetLogFile
done < sample.log
}
newFunction3(){
iter=0
pat=""
while read ref
do
if [[ "$pat." != "." ]] ; then
pat="$pat|"
fi
pat="$pat$ref"
done < callref.log
# Log "Have pattern $pat"
while read ref
do
((iter++))
echo "$ref" | grep 'got hangup request, cause 31' | grep -oP "$pat" >> $targetLogFile
done < sample.log
#grep: regular expression is too large
}
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
numLines="100000"
Log "testing algorithms with $numLines in each log file."
setupCallRef(){
./genCallRefLog.sh $numLines
}
setupSampleLog(){
./genSampleLog.sh $numLines
}
setupCallRef
setupSampleLog
runMethodForDuration standard > /dev/null
checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration subStandardVarient > /dev/null
checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration newFunction > /dev/null
checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction2 > /dev/null
# checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction3 > /dev/null
# checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction4 > /dev/null
# checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration newFunction5 > /dev/null
checkTargetFileSize "$numLines"
以上显示现有方法总是比我提出的任何方法都快。我认为有人会注意优化它。
答案 1 :(得分:0)
由于即使有要求也无法生成足够的样本日志进行测试,我自己掀起了一些测试材料:
$ cat callref.log
a
b
$ cat sample.log
a 1
b 2
c 1
使用awk:
$ awk 'NR==FNR { # hash callrefs
a[$1]
next
}
{ # check callrefs from sample records and output when match
for(l in a)
if($0 ~ l && $0 ~ 1) # 1 is the static string you look for along a callref
print l
}' callref.log sample.log
a 1
HTH