Question

我有一个shell脚本，每小时通过cron作业调用一次，并搜索星号日志并为我提供一个以原因31结束的呼叫的唯一ID。

while read ref
do
cat sample.log | grep "$ref" | grep 'got hangup request, cause 31' | grep -o 'C-[0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z]' >> cause_temp.log
done < callref.log

问题是while循环太慢而且为了准确性我已经包含了4个while循环来执行各种检查。

callref.log文件由呼叫标识符值组成，每小时它将有大约50-90万个值，脚本大约需要45-50分钟才能完成执行并向我发送报告。

如果我能够减少循环的执行时间，那将是非常有帮助的。由于sample.log文件的大小约为20 GB，并且每个循环都打开文件并执行搜索，我认为while循环是这里的瓶颈。

做了研究，发现了一些有用的链接 Link 1 Link 2

但是解决方案建议我无法实现或不知道如何实现。任何建议都会有所帮助。感谢

由于sample.log包含敏感信息，因此我无法共享任何日志，但下面是我从互联网上获取的一些示例日志。

Dec 16 18:02:04 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"503"<sip:503@192.168.1.107>' failed for '192.168.1.137' - Wrong password
Dec 16 18:03:13 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"502"<sip:502@192.168.1.107>' failed for '192.168.1.137' - Wrong password
Dec 16 18:04:49 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"1737245082"<sip:1737245082@192.168.1.107>' failed for '192.168.1.137' - Username/auth name mismatch
Dec 16 18:04:49 asterisk1 asterisk[31774]: NOTICE[31787]: chan_sip.c:11242 in handle_request_register: Registration from '"100"<sip:100@192.168.1.107>' failed for '192.168.1.137' - Username/auth name mismatch
Jun 27 18:09:47 host asterisk[31774]: ERROR[27910]: chan_zap.c:10314 setup_zap: Unable to register channel '1-2'
Jun 27 18:09:47 host asterisk[31774]: WARNING[27910]: loader.c:414 __load_resource: chan_zap.so: load_module failed, returning -1
Jun 27 18:09:47 host asterisk[31774]: WARNING[27910]: loader.c:554 load_modules: Loading module chan_zap.so failed!

文件callref.log包含一个看起来像 -

的行列表

C-001ec22d
C-001ec23d
C-001ec24d
C-001ec31d
C-001ec80d

上述while循环的所需输出也类似于C-001ec80d

我的主要关注点是让while循环运行得更快。就像在数组中加载callref.log的所有值一样，如果可能的话，在sample.log的一次传递中同时搜索所有值。

Answer 1

我花了一天时间构建测试框架并测试不同命令的变体，我认为你已经拥有最快的命令。

这让我觉得如果要获得更好的性能，你应该研究一个日志摘要框架，比如ossec（你的日志样本来自哪里）也许是splunk。那些可能对你的意愿太笨拙了。或者，您应该考虑在java / C / perl / awk中设计和构建更适合解析的东西。

更频繁地运行现有脚本也会有所帮助。

祝你好运！如果你愿意，我可以将我所做的工作收拾好并在此发布，但我认为它有点过分。

按要求; CalFuncs.sh：我在大多数脚本中提供的库

#!/bin/bash

LOGDIR="/tmp"
LOG=$LOGDIR/CalFunc.log
[ ! -d "$LOGDIR" ] && mkdir -p $(dirname $LOG)

SSH_OPTIONS="-o StrictHostKeyChecking=no -q -o ConnectTimeout=15"
SSH="ssh $SSH_OPTIONS -T"
SCP="scp $SSH_OPTIONS"
SI=$(basename $0)

Log() {
    echo "`date` [$SI] $@" >> $LOG
}

Run() {
    Log "Running '$@' in '`pwd`'"
  $@ 2>&1 | tee -a $LOG
}

RunHide() {
    Log "Running '$@' in '`pwd`'"
    $@ >> $LOG 2>&1
}

PrintAndLog() {
    Log "$@"
    echo "$@"
}

ErrorAndLog() {
    Log "[ERROR] $@ "
    echo "$@" >&2
}

showMilliseconds(){
  date +%s
}

runMethodForDuration(){
  local startT=$(showMilliseconds)
  $1
  local endT=$(showMilliseconds)
  local totalT=$((endT-startT))
  PrintAndLog "that took $totalT seconds to run $1"
  echo $totalT
}

genCallRefLog.sh - 根据参数

生成虚构的callref.log大小

#!/bin/bash
#Script to make 80000 sequential lines of callref.log this should suffice for a POC
if [ -z "$1" ] ; then
  echo "genCallRefLog.sh requires an integer of the number of lines to pump out of callref.log"
  exit 1
fi
file="callref.log"
[ -f "$file" ] && rm -f "$file"  # del file if exists
i=0 #put start num in here
j="$1" #put end num in here
echo "building $j lines of callref.log"
for ((  a=i ;  a < j;  a++  ))
do
  printf 'C-%08x\n' "$a" >> $file
done

genSampleLog.sh根据参数

生成虚构的sample.log大小

#!/bin/bash
#Script to make 80000 sequential lines of callref.log this should suffice for a POC
if [ -z "$1" ] ; then
  echo "genSampleLog.sh requires an integer of the number of lines to pump out of sample.log"
  exit 1
fi
file="sample.log"
[ -f "$file" ] && rm -f "$file"  # del file if exists
i=0 #put start num in here
j="$1" #put end num in here
echo "building $j lines of sample.log"
for ((  a=i ;  a < j;  a++  ))
do
  printf 'Dec 16 18:02:04 asterisk1 asterisk[31774]: NOTICE[31787]: C-%08x got hangup request, cause 31\n' "$a" >> $file
done

最后是我使用的实际测试脚本。通常我会注释掉构建脚本，因为它们只需要在更改日志大小时运行。我通常也只会一次运行一个测试功能并记录结果。

test.sh

#!/bin/bash
source "./CalFuncs.sh"

targetLogFile="cause_temp.log"
Log "Starting"

checkTargetFileSize(){
  expectedS="$1"
  hasS=$(cat $targetLogFile | wc -l)
  if [ "$expectedS" != "$hasS" ] ; then
    ErrorAndLog "Got $hasS but expected $expectedS, when inspecting $targetLogFile"
    exit 244
  fi
}

standard(){
  iter=0
  while read ref
  do
    cat sample.log | grep "$ref" | grep 'got hangup request, cause 31' | grep -o 'C-[0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z][0-9a-z]' >> $targetLogFile
  done < callref.log
}

subStandardVarient(){
  iter=0
  while read ref
  do
    cat sample.log | grep 'got hangup request, cause 31' | grep -o "$ref"  >> $targetLogFile
  done < callref.log
}

newFunction(){
  grep -f callref.log sample.log | grep 'got hangup request, cause 31'  >> $targetLogFile
}

newFunction4(){
  grep 'got hangup request, cause 31' sample.log | grep -of 'callref.log'>> $targetLogFile
}

newFunction5(){
  #splitting grep
  grep 'got hangup request, cause 31' sample.log > /tmp/somefile
  grep -of 'callref.log' /tmp/somefile >> $targetLogFile
}

newFunction2(){
  iter=0

  while read ref
  do
    ((iter++))
    echo "$ref" | grep 'got hangup request, cause 31' | grep -of 'callref.log' >> $targetLogFile
  done < sample.log
}

newFunction3(){
  iter=0
  pat=""
  while read ref
  do
    if [[ "$pat." != "." ]] ; then
      pat="$pat|"
    fi
    pat="$pat$ref"
  done < callref.log
  # Log "Have pattern $pat"
  while read ref
  do
    ((iter++))
    echo "$ref" | grep 'got hangup request, cause 31' | grep -oP "$pat" >> $targetLogFile
  done < sample.log
  #grep: regular expression is too large
}

[ -f "$targetLogFile" ] && rm -f "$targetLogFile"

numLines="100000"
Log "testing algorithms with $numLines in each log file."

setupCallRef(){
  ./genCallRefLog.sh $numLines
}

setupSampleLog(){
  ./genSampleLog.sh $numLines
}

setupCallRef
setupSampleLog

runMethodForDuration standard > /dev/null
checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration subStandardVarient > /dev/null
checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration newFunction > /dev/null
checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction2 > /dev/null
# checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction3 > /dev/null
# checkTargetFileSize "$numLines"
# [ -f "$targetLogFile" ] && rm -f "$targetLogFile"
# runMethodForDuration newFunction4 > /dev/null
# checkTargetFileSize "$numLines"
[ -f "$targetLogFile" ] && rm -f "$targetLogFile"
runMethodForDuration newFunction5 > /dev/null
checkTargetFileSize "$numLines"

以上显示现有方法总是比我提出的任何方法都快。我认为有人会注意优化它。

Answer 2

由于即使有要求也无法生成足够的样本日志进行测试，我自己掀起了一些测试材料：

$ cat callref.log
a
b
$ cat sample.log
a 1
b 2
c 1

使用awk：

$ awk 'NR==FNR {             # hash callrefs
    a[$1]
    next
}
{                            # check callrefs from sample records and output when match
    for(l in a)
        if($0 ~ l && $0 ~ 1) # 1 is the static string you look for along a callref
            print l
}' callref.log sample.log
a 1

HTH

从巨大的日志文件中获取大量模式

2 个答案: