Question

我正在尝试编写一个（相当）简单的日志解析器来帮助我调试应用程序故障。

我目前正在尝试实现的是找到＆＃34;连接超时＆＃34;的每个实例，然后找到字符串＆＃34;处理文件＆＃34;它来自10-30行以上＆＃34;连接超时＆＃34; （并不总是相同的行数）。

我的代码目前看起来像这样：

!#/bin/bash
connectionTimeOutLines=`zcat filename | grep -n "Connection timed out" | cut -f1 -d:` #get the line number of all instances of connection timed out
for timeOutLine in "$connectionTimeOutLines"
do
     # get the date and time the event was logged
     logDate=`zcat filename | sed "${timeOutLine}q;d" | awk '{print $1}' | awk '{print substr($0,2)}'`
     logTime=`zcat filename | sed "${timeOutLine}q;d" | awk '{print $2}'`
     # need to get the "file processed line" here
     fileProcessed="unsure what I am doing here"
     echo "$fileProcessed timed out at $logTime on $logDate" >> /tmp/logFile.log
done

为了简洁，我编辑了部分代码，因为它对问题没有任何影响......这是：我如何找到另一个字符串之前的字符串实例？

我无法纯粹根据＆＃34;处理文件＆＃34;进行搜索。因为每次处理文件时都会显示该字符串，而我正在寻找处理失败的实例（＆＃34;连接超时＆＃34;）。

TBH，我不是100％我已经正确解释了自己，所以我提前道歉 - 请在必要时要求澄清！

Answer 1

要解决此问题，首先必须消除输入中包含的不确定性：

... ＆＃34;处理文件＆＃34;它来自10-30行以上＆＃34;连接超时＆＃34; （并不总是相同的行数）

只删除除感兴趣的行之外的所有行（包含＆＃34;处理文件＆＃34;或＆＃34;连接超时＆＃34;

zcat filename | grep "Processing file\|Connection timed out"

我非常确定您可以完全自己从预处理的输入中获取所需的数据。然而，一个完整的工作解决方案如下：

<强> detect_timed_out_files

#!/bin/bash

F='Processing file'
T='Connection timed out'

grep "$F\|$T"                                                                  \
| sed -e "/$F/ {s/.\+Processing file \(.\+\)/\1/; h; d;}"                      \
      -e "/$T/ {H;x;s/\(\S\+\)\n\(\S\+\) \(\S\+\).*/\1 timed out at \3 on \2/}"

测试输入：

2016-06-24 01:23:45 Processing file xxx
Humpty
Dumpty
sat
2016-06-24 01:23:46 Processing file yyy
on
a
wall
2016-06-24 01:23:51 Connection timed out
Humpty
2016-06-24 01:23:52 Processing file zzz
Dumpty
had
a
2016-06-24 01:23:53 Processing file abc
2016-06-24 01:23:59 Connection timed out
great
fall

<强>输出：

$ cat input|./detect_timed_out_files 
yyy timed out at 01:23:51 on 2016-06-24
abc timed out at 01:23:59 on 2016-06-24

在另一个特定字符串上搜索特定字符串

1 个答案: