我正在尝试编写一个(相当)简单的日志解析器来帮助我调试应用程序故障。
我目前正在尝试实现的是找到"连接超时"的每个实例,然后找到字符串"处理文件"它来自10-30行以上"连接超时" (并不总是相同的行数)。
我的代码目前看起来像这样:
!#/bin/bash
connectionTimeOutLines=`zcat filename | grep -n "Connection timed out" | cut -f1 -d:` #get the line number of all instances of connection timed out
for timeOutLine in "$connectionTimeOutLines"
do
# get the date and time the event was logged
logDate=`zcat filename | sed "${timeOutLine}q;d" | awk '{print $1}' | awk '{print substr($0,2)}'`
logTime=`zcat filename | sed "${timeOutLine}q;d" | awk '{print $2}'`
# need to get the "file processed line" here
fileProcessed="unsure what I am doing here"
echo "$fileProcessed timed out at $logTime on $logDate" >> /tmp/logFile.log
done
为了简洁,我编辑了部分代码,因为它对问题没有任何影响......这是:我如何找到另一个字符串之前的字符串实例?
我无法纯粹根据"处理文件"进行搜索。因为每次处理文件时都会显示该字符串,而我正在寻找处理失败的实例("连接超时")。
TBH,我不是100%我已经正确解释了自己,所以我提前道歉 - 请在必要时要求澄清!
答案 0 :(得分:1)
要解决此问题,首先必须消除输入中包含的不确定性:
... "处理文件"它来自10-30行以上"连接超时" (并不总是相同的行数)
只删除除感兴趣的行之外的所有行(包含"处理文件"或"连接超时"
zcat filename | grep "Processing file\|Connection timed out"
我非常确定您可以完全自己从预处理的输入中获取所需的数据。然而,一个完整的工作解决方案如下:
<强> detect_timed_out_files 强>
#!/bin/bash
F='Processing file'
T='Connection timed out'
grep "$F\|$T" \
| sed -e "/$F/ {s/.\+Processing file \(.\+\)/\1/; h; d;}" \
-e "/$T/ {H;x;s/\(\S\+\)\n\(\S\+\) \(\S\+\).*/\1 timed out at \3 on \2/}"
测试输入:
2016-06-24 01:23:45 Processing file xxx
Humpty
Dumpty
sat
2016-06-24 01:23:46 Processing file yyy
on
a
wall
2016-06-24 01:23:51 Connection timed out
Humpty
2016-06-24 01:23:52 Processing file zzz
Dumpty
had
a
2016-06-24 01:23:53 Processing file abc
2016-06-24 01:23:59 Connection timed out
great
fall
<强>输出强>:
$ cat input|./detect_timed_out_files
yyy timed out at 01:23:51 on 2016-06-24
abc timed out at 01:23:59 on 2016-06-24