Question

我使用shell脚本大约3年后启动tail -F -n0 /path/to/LOGFILE，解析并格式化输出并将其转储到管道分隔文件中。但是，我们每天从几千个日志行变为几百万个日志行，并且脚本开始占用大量内存和CPU。

我最近离开并用awk替换了所有数据解析逻辑，这似乎在测试中在解析数据时要快许多个数量级。为了测试新的awk代码，我尝试使用awk代码和shell代码通过逻辑（约600万行）抽取整天的日志值，并且毫不奇怪awk代码在~10秒内拉出~92K相关行。 shell代码花了15分钟才完成同样的事情。但是，如果我使用完全相同的代码，而不是cat /path/to/file|awk '...＆＃39;我tail -F -n0 /path/to/file|awk '...文本被写入文件时有一个巨大的延迟，最多约2-3分钟，而不是通过shell代码约0.5-1.0秒。

Shell代码（是的，我知道shell代码有多丑）：

outDir="/opt/redacted/logs/allerrors/"
tail -F -n0 /opt/redacted/logs/manager.log|while read -a logLine;do
if [[ "${logLine[2]}" == "E" ]];then
  if [[ "${logLine[7]:0:1}" == "@" ]];then
    echo "${logLine[0]}|${logLine[1]}|${logLine[6]}|${logLine[7]}|${logLine[@]:8:${#logLine[@]}}" >> ${outDir}allerrors.${logLine[0]//\//.}.log
  else
    echo "${logLine[0]}|${logLine[1]}|${logLine[6]}|NULL|${logLine[@]:7:${#logLine[@]}}" >> ${outDir}allerrors.${logLine[0]//\//.}.log
  fi
fi

完成

awk代码：

  outDir="/opt/redacted/logs/allerrors/"
  tail -F -n0 /opt/redacted/logs/manager.log|awk -v dir=$outDir '{OFS="|"}
{
  if ($3 == "E")
  {
    file="allerrors."$1".log"
    gsub("/",".",file)
    if ($8 ~ /@/)
      print $1,$2,$7,$8,substr($0, index($0,$9)) >> dir file
    else {if ($8 !~ /@/)
      print $1,$2,$7,"NULL",substr($0, index($0,$8)) >> dir file
    }
  }
}'

要清楚，如果我使用cat而不是拖尾文件，两组代码都可以工作并创建相同的输出，但是使用awk代码我不会在输出文件中看到结果，直到〜2-3分钟后它出现在日志中，而shell版本只需要几秒钟。

Answer 1

默认情况下

awk缓冲区，而sh则不会。这会增加awk的吞吐量，但也会增加其延迟。

只需将fflush();添加到您的awk代码中以强制缓冲区刷新：

  outDir="/opt/redacted/logs/allerrors/"
  tail -F -n0 /opt/redacted/logs/manager.log|awk -v dir=$outDir '{OFS="|"}
{
  if ($3 == "E")
  {
    file="allerrors."$1".log"
    gsub("/",".",file)
    if ($8 ~ /@/)
      print $1,$2,$7,$8,substr($0, index($0,$9)) >> dir file
    else {if ($8 !~ /@/)
      print $1,$2,$7,"NULL",substr($0, index($0,$8)) >> dir file
    }
    fflush();
  }
}'

Answer 2

tail -f -n0 filename

阻塞，直到对文件有新的I / O，因此它就像已经建议的那样，输入问题，与awk进程中stdout的缓冲有关。但是，如果日志文件没有变化2分钟，则awk I / O缓冲不能完全解决时间延迟问题。

尝试此操作以显示行缓冲的效果：

while true                    
do                        
echo -n 'this is a string'
sleep 5                   
echo ' add a newline'     
done | awk '{print $0}'

为什么使用awk将数据打印到文件的速度比使用shell快得多？

2 个答案: