我有一个日志文件如下。每行记录一些字符串和线程ID。每个主题属于进程,进程可以包含N个主题。
基于以下示例,我想提取(使用bash工具,grep
,sed
等等)属于给定进程的所有线程的所有行。请注意,该过程仅在线程序列的顶部提及一次:
line1 thread= 150 process= 200
line2 thread= 152 whatever
line3 thread= 150 whatever
line4 thread= 150 whatever
line5 thread= 130 whatever
line6 thread= 130 process= 200
line7 thread= 150 process= 201
line8 thread= 130 whatever
line9 thread= 130 whatever
对于此示例,请提供进程200
,输出应为:
line1 thread= 150 process= 200
line3 thread= 150 whatever
line4 thread= 150 whatever
line6 thread= 130 process= 200
line8 thread= 130 whatever
line9 thread= 130 whatever
答案 0 :(得分:0)
awk 解决方案:
filter_threads.awk 脚本:
#!/bin/awk -f
function get_thread(s){ # extracts thread number from the string
t = substr(s,index(s,"=")+1); # considering `=` as separator (e.g. `thread=150`)
return t;
}
BEGIN {
pat = "process="p # regex pattern to match the line with specified process
}
$3~pat { # on encountering "process" line
thread = get_thread($2); print; next # getting base thread number
}
{
t = get_thread($2);
if (t==thread) print # comparing current thread numbers with base thread number
}
用法:
awk -f filter_threads.awk -v p=200 yourfile
- 其中p
是进程号
输出:
line1 thread=150 process=200
line3 thread=150 whatever
line4 thread=150 whatever
line6 thread=130 process=200
line8 thread=130 whatever
line9 thread=130 whatever
<强> 更新 强>:
当您更改初始输入时,新解决方案如下:
awk -v p=200 '$4~/process=/ && $5==p{ thread=$3; print; next }$3==thread{ print }' yourfile