从日志文件中提取单词

时间:2018-06-16 02:58:03

标签: bash awk sed

我正在尝试从日志文件中提取作业ID,而我在使用bash提取它们时遇到问题。我尝试过使用sed。

这是我的日志文件的样子:

> 2018-06-16 02:39:39,331 INFO  org.apache.flink.client.cli.CliFrontend 
> - Running 'list' command.
> 2018-06-16 02:39:39,641 INFO  org.apache.flink.runtime.rest.RestClient                      
> - Rest client endpoint started.
> 2018-06-16 02:39:39,741 INFO  org.apache.flink.client.cli.CliFrontend                       
> - Waiting for response...
>  Waiting for response...
> 2018-06-16 02:39:39,953 INFO  org.apache.flink.client.cli.CliFrontend                       
> - Successfully retrieved list of jobs
> ------------------ Running/Restarting Jobs -------------------
> 15.06.2018 18:49:44 : 1280dfd7b1de4c74cacf9515f371844b : jETTY HTTP Server -> servlet with content decompress -> pull from
> collections -> CSV to Avro encode -> Kafka publish (RUNNING)
> 16.06.2018 02:37:07 : aa7a691fa6c3f1ad619b6c0c4425ba1e : jETTY HTTP Server -> servlet with content decompress -> pull from
> collections -> CSV to Avro encode ->  Kafka publish (RUNNING)
> --------------------------------------------------------------
> 2018-06-16 02:39:39,956 INFO  org.apache.flink.runtime.rest.RestClient                      
> - Shutting down rest endpoint.
> 2018-06-16 02:39:39,957 INFO  org.apache.flink.runtime.rest.RestClient                      
> - Rest endpoint shutdown complete.

我使用以下代码提取包含jobId的行:

extractRestResponse=`cat logFile.txt`
echo "extractRestResponse: "$extractRestResponse

w1="------------------ Running/Restarting Jobs -------------------"
w2="--------------------------------------------------------------"
extractRunningJobs="sed -e 's/.*'"$w1"'\(.*\)'"$w2"'.*/\1/' <<< $extractRestResponse"
runningJobs=`eval $extractRunningJobs`
echo "running jobs :"$runningJobs

然而,这并没有给我任何结果。另外我注意到,当我打印extractRestResponse变量时,所有换行都会丢失。

我也试过使用这个命令,但它没有给我任何结果:

extractRestResponse="sed -n '/"$w1"/,/"$w2"/{//!p}' logFile.txt"

3 个答案:

答案 0 :(得分:1)

awk救援!

awk '/^-+$/{f=0} f; /^-+ Running\/Restarting Jobs -+$/{f=1}' logfile

答案 1 :(得分:1)

使用sed:

sed -n '/^-* Running\/Restarting Jobs -*/,/^--*/{//!p;}' logFile.txt

<强>说明:

  • 在应用命令后,输入行默认回显到标准输出。 -n标志会抑制此行为
  • /^-* Running\/Restarting Jobs -*/,/^--*/:匹配从^-* Running\/Restarting Jobs -*^--*(包含)的行
  • //!p;:打印除地址
  • 之外的行

答案 2 :(得分:0)

您可以改善原来的替代品:

sed -e 's/.*'"$w1"'\(.*\)'"$w2"'.*/\1/' <<< $extractRestResponse

使用@作为分隔符:

sed -n "s@.*$w1\(.*\)$w2.*@\1@p" <<< $extractRestResponse

输出是$w1$w2之间的文字:

> 15.06.2018 18:49:44 : 1280dfd7b1de4c74cacf9515f371844b : jETTY HTTP Server -> servlet with content decompress -> pull from > collections -> CSV to Avro encode -> Kafka publish (RUNNING) > 16.06.2018 02:37:07 : aa7a691fa6c3f1ad619b6c0c4425ba1e : jETTY HTTP Server -> servlet with content decompress -> pull from > collections -> CSV to Avro encode -> Kafka publish (RUNNING) >