我的文本文件基本上具有这种布局:
Stimulus ...
...
...
...
Response
Stimulus ...
...
...
...
Response
我使用sed获取介于两者之间的所有内容,然后进一步提取我需要的信息。
sed -n -e '/Stimulus/,/Response/ p'
但是,有时参与者不回复,在这种情况下文件看起来像这样:
Stimulus ...
...
...
...
Stimulus ...
...
...
...
Response
在这种特殊情况下,我的脚本无法获得我想要的内容。所以,我正在寻找一种方法来提取信息,当且仅当pattern1后面跟着pattern2而不是pattern1时。
如果我表达不清楚,请告诉我。我很乐意提供进一步的信息。
答案 0 :(得分:7)
一种肮脏的方式,虽然它似乎在我的测试中起作用,但可能是反转文件内容,从Response
搜索到Stimulus
并再次反转结果。
假设有以下输入数据:
Stimulus 1...
...
...
...
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 4...
...
...
...
Stimulus 5...
命令:
tac infile | sed -ne '/Response/,/Stimulus/ p' | tac -
收率:
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
编辑:有关已隔离的Response
部分的示例。有两次过滤(根据OP的评论):
tac infile |
sed -ne '/Response/,/Stimulus/ p' |
tac - |
sed -ne '/Stimulus/,/Response/ p'
答案 1 :(得分:5)
这是一个纯粹的bash解决方案:
tmp=()
while read l; do
[[ $l =~ ^Stimulus ]] && tmp=("$l") && continue
[ ${#tmp[@]} -eq 0 ] && continue
tmp+=("$l")
[[ $l =~ ^Response ]] && printf "%s\n" "${tmp[@]}" && tmp=()
done <infile
如果找到以tmp
开头的列表,则会开始填充数组Stimulus
。如果另一个Stimulus
到达,它只会清除tmp
并再次启动作业。如果找到Response
,则会打印tmp
数组的内容。实际上printf
内置了一个隐式循环。
输入:
cat >infile <<XXX
...
Response 0
...
Stimulus 1
...
Stimulus 2
...
Response 2
...
Stimulus 3
...
Response 3
...
Response 4
XXX
输出:
Stimulus 2
...
Response 2
Stimulus 3
...
Response 3
答案 2 :(得分:4)
其他选项是切换到perl
及其触发器(范围运算符):
perl -lne '
BEGIN {
## Create regular expression to match the initial and final words.
($from_re, $to_re) = map { qr/\A$_/ } qw|Stimulus Response|;
}
## Range, similar to "sed".
if ( $r = ( m/$from_re/o ... m/$to_re/o ) ) {
## If inside the range and found the initial word again, remove
## all lines saved.
if ( $r > 1 && m/$from_re/o ) {
@data = ();
}
## Save line.
push @data, $_;
## At the end of the range, print all lines saved.
if ( $r =~ m/E0\z/ ) {
printf qq|%s\n|, join qq|\n|, @data;
@data = ();
}
}
' infile
假设输入文件为:
Stimulus 1...
...
...
...
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
Stimulus 4...
...
...
...
Stimulus 5...
它产生:
Stimulus 2...
...
...
...
Response 2
Stimulus 3...
...
...
...
Response 3
答案 3 :(得分:4)
这是一个纯粹的bash解决方案,试图减少愚蠢的副作用:
#!/bin/bash
out=()
while read -r l; do
case "$l" in
Stimulus*) out=( "$l" ) ;;
Response*) ((${#out[@]}!=0)) && { printf "%s\n" "${out[@]}" "$l"; out=(); } ;;
*) ((${#out[@]}!=0)) && out+=( "$l" ) ;;
esac
done < infile
它还处理Response
但没有Stimulus
的情况。
答案 4 :(得分:4)
已更新以处理已隔离的回复
awk '
/Response/ {
if (p==1) {
for(;k<length(a);) {
print a[++k]
}
print $0
}
delete a;k=p=0
}
/Stimulus/ {
if (p==1) {
delete a; i=0
}
p=1
}
p { a[++i]=$0 }' log
答案 5 :(得分:4)
sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file
输入文件:
Stimulus 1... bad bad bad Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 4... bad bad bad bad Stimulus 5... ... ... ... ... Response 5 bad bad bad bad Response 6 bad bad bad
输出:
$sed -n 'H;/^Stimulus/{h;d};/^Response/{x;s/^Response//;tk;p;:k;d}' file Stimulus 2... ... ... ... Response 2 Stimulus 3... ... ... ... Response 3 Stimulus 5... ... ... ... ... Response 5
我的GNU代码awk:
awk '{a[++i]=$0};/^Response/ && a[1] !~ /^Response/ {for (k=1; k<=i; k++) {print a[k]}};/^Stimulus|^Response/ { delete a; i=0; a[++i]=$0}' file
如您所见,我需要太多awk代码......