在shell脚本中跨多行匹配表达式

时间:2017-01-22 05:24:52

标签: regex shell scripting grep

我希望在shell脚本中跨多行匹配模式。我的输入是:

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n1 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n2 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

我试图仅使用正则表达式为特定ID显示输出(例如,n1或n2)。我尝试了START(.|\n)*ID: n1(.|\n)*END正则表达式,但它也获取了ID:n2的数据。我应该对正则表达式进行哪些更改才能获取特定ID的数据?

我使用cat inputfile | grep 'pattern' > outputfile作为命令。

每个块中的行数以及STARTID: n1ID: n1END之间的行数可以变化,因此使用头/尾不是一个可行的选择。此外,我想在ID匹配时从START到END打印整个块。

编辑:我尝试使用Online Regex Creator,它可以成功匹配正则表达式

START[\s\S][^END]*ID: n1[\s\S][^END]*END

在我的输入文件上。

3 个答案:

答案 0 :(得分:1)

在段落模式下

awk,使用两个连续的换行符作为记录分隔符:

awk -v RS='\n\n' '/ID: n1/' file.txt

n1替换为n2n3替换为其他人。

示例:

$ cat file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END

START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END

START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n1/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n2/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n3/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END

答案 1 :(得分:1)

GNU awk Mawk 解决方案,可以在配对{{}之间处理任意数量的行,包括空行1}}和START次出现:

END

此解决方案使用多字符awk -v id='n2' -v RS='(^|\n)START |\nEND' ' $0 ~ ("\nID: " id " ") { print "START " $0 "\nEND" } ' file 值(也是正则表达式),POSIX spec不支持该值。 GNU RSMawk(Ubuntu上的默认awk)都支持这样的值,而BSD / macOS awk则不支持。

  • awk将ID值-v id='n2'作为变量n2传递给Awk。

  • id通过输入/行开头的标记RS='(^|\n)START |\nEND'和换行符后的标记START 之间的(换行)文本将输入分为多个记录。

  • END将每个输入记录($0 ~ ("\nID: " id " "))与匹配指定ID的正则表达式($0)匹配:换行后跟~,后跟感兴趣的ID值(存储在变量ID: 中)和空格 注意Awk中的字符串连接是如何工作的,只需将字符串/变量引用放在一起即可。

  • 如果匹配,id打印手头的输入记录,由print "START " $0 "\nEND"START令牌(作为输入记录分隔符,不是&#)预订39; t报告为END)的一部分。

如果配对$0START次出现之间的行都是非空 (即,至少包含1个字符,即使该字符是空格或制表符,也是符合POSIX标准的END 解决方案:

awk

请注意,awk -v id='n2' -v RS= '$0 ~ ("\nID: " id " ")' file ,即将输入记录分隔符(-v RS=)设置为空字符串,是一个RS成语,它通过段落将输入分成记录< / em>(非空行的运行)。

答案 2 :(得分:0)

awk中,您可以在起始模式和结束模式之间累积文本,然后为匹配测试该缓冲区:

cat inputfile | awk  '/^START/        { buf=$0 "\n"; flag=1; next } 
                      flag            { buf=buf $0 "\n" } 
                      /^END/ && flag  { flag=0; if (buf ~ /ID: n1 |ID: n2 /) print buf }'

在Perl中你可以这样做:

cat inputfile | perl -0777 -lne 'while (/(^START.*?^ID: (n\d+) .*?^END)/gms){
    if ($2 eq "n1" || $2 eq "n2"){
        print "$1\n\n";
    }
}'

在任何一种情况下,您可能希望awk '{script}' inputfileperl '{script}' inputfile而不是使用cat