Question

我的文件是a.txt：

this is for testing
so test
againa and again
zzz and ssss
this is for testing
so test
againa and again

这里我试图在zzz和test之间提取测试：

 cat a.txt | sed -n '/zzz/,/test/p'

输出：

 zzz and ssss
 this is for testing
 so test

问题是：

cat a.txt | sed -n '/zzz/,/jjj/p'

当我试图保留文件中不存在的某个单词（jjj）时，它会向我提供从zzz到文件末尾的数据。理想情况下，它不应该返回任何东西。

Answer 1

sed并不像你希望的那样聪明。您可以在看到第一个正则表达式后使用awk：存储行。当您点击第二个正则表达式时，打印出您捕获的所有行

awk -v regex1="zzz" -v regex2="jjj" '
    $0 ~ regex1 {start=1} 
    start {lines = lines $0 ORS} 
    start && $0 ~ regex2 {printf "%s", lines; exit}
'

Answer 2

grep -oP将是更好的选择：

$ grep -oP 'zzz[\s\S]*test' a.txt 
zzz and ssss
this is for testing
so test

grep -oP 'zzz[\s\S]*jjj' a.txt

Answer 3

另一种解决方案，只是为了好玩：

[ ~]$ awk 'BEGIN{b=e=0; s=es=""} 
      ($0 ~ "^zzz.*"){b=1} 
      ($0 ~ ".*test$"){e=1; b=0; es=s; s=""; if(es!=""){es=es"\n"$0}else{es=$0}} 
      (b==1){if(s!=""){s=s"\n"$0}else{s=$0}} END {print es}' file

具有相同输入文件的输出：

zzz and ssss
this is for testing
so test

如果您通过另一个与输入文件中的单词不对应的模式更改“。* test $”，则此命令将不会生成输出：

[ ~]$ awk 'BEGIN{b=e=0; s=es=""} 
      ($0 ~ "^zzz.*"){b=1} 
      ($0 ~ ".*jjj$"){e=1; b=0; es=s; s=""; if(es!=""){es=es"\n"$0}else{es=$0}} 
      (b==1){if(s!=""){s=s"\n"$0}else{s=$0}} END {print es}' file
[ ~]$

当然，您可以使用“-v”选项轻松配置正则表达式。

否则，使用grep的anubhava命令对我的笔记本电脑不起作用：

[neumann@MacBookPro ~]$ cat file
this is for testing
so test
againa and again
zzz and ssss
this is for testing
so test
againa and again
[neumann@MacBookPro ~]$ grep -oP 'zzz[\s\S]*test' file
[neumann@MacBookPro ~]$ grep --version
grep (GNU grep) 2.14
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
[neumann@MacBookPro ~]$

这就是为什么，当我有一个多行模式时，我用awk做这个。

用于在两个单词之间提取测试的sed命令中的问题

3 个答案: