用于在两个单词之间提取测试的sed命令中的问题

时间:2013-10-22 19:46:25

标签: bash unix

我的文件是a.txt:

this is for testing
so test
againa and again
zzz and ssss
this is for testing
so test
againa and again

这里我试图在zzz和test之间提取测试:

 cat a.txt | sed -n '/zzz/,/test/p'

输出:

 zzz and ssss
 this is for testing
 so test

问题是:

cat a.txt | sed -n '/zzz/,/jjj/p'

当我试图保留文件中不存在的某个单词(jjj)时,它会向我提供从zzz到文件末尾的数据。理想情况下,它不应该返回任何东西。

3 个答案:

答案 0 :(得分:1)

sed并不像你希望的那样聪明。您可以在看到第一个正则表达式后使用awk:存储行。当您点击第二个正则表达式时,打印出您捕获的所有行

awk -v regex1="zzz" -v regex2="jjj" '
    $0 ~ regex1 {start=1} 
    start {lines = lines $0 ORS} 
    start && $0 ~ regex2 {printf "%s", lines; exit}
'

答案 1 :(得分:0)

grep -oP将是更好的选择:

$ grep -oP 'zzz[\s\S]*test' a.txt 
zzz and ssss
this is for testing
so test

grep -oP 'zzz[\s\S]*jjj' a.txt

答案 2 :(得分:0)

另一种解决方案,只是为了好玩:

[ ~]$ awk 'BEGIN{b=e=0; s=es=""} 
      ($0 ~ "^zzz.*"){b=1} 
      ($0 ~ ".*test$"){e=1; b=0; es=s; s=""; if(es!=""){es=es"\n"$0}else{es=$0}} 
      (b==1){if(s!=""){s=s"\n"$0}else{s=$0}} END {print es}' file

具有相同输入文件的输出:

zzz and ssss
this is for testing
so test

如果您通过另一个与输入文件中的单词不对应的模式更改“。* test $”,则此命令将不会生成输出:

[ ~]$ awk 'BEGIN{b=e=0; s=es=""} 
      ($0 ~ "^zzz.*"){b=1} 
      ($0 ~ ".*jjj$"){e=1; b=0; es=s; s=""; if(es!=""){es=es"\n"$0}else{es=$0}} 
      (b==1){if(s!=""){s=s"\n"$0}else{s=$0}} END {print es}' file
[ ~]$

当然,您可以使用“-v”选项轻松配置正则表达式。

否则,使用grep的anubhava命令对我的笔记本电脑不起作用:

[neumann@MacBookPro ~]$ cat file
this is for testing
so test
againa and again
zzz and ssss
this is for testing
so test
againa and again
[neumann@MacBookPro ~]$ grep -oP 'zzz[\s\S]*test' file
[neumann@MacBookPro ~]$ grep --version
grep (GNU grep) 2.14
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
[neumann@MacBookPro ~]$ 

这就是为什么,当我有一个多行模式时,我用awk做这个。