我希望在shell脚本中跨多行匹配模式。我的输入是:
START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n1 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END
START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n2 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END
我试图仅使用正则表达式为特定ID显示输出(例如,n1或n2)。我尝试了START(.|\n)*ID: n1(.|\n)*END
正则表达式,但它也获取了ID:n2的数据。我应该对正则表达式进行哪些更改才能获取特定ID的数据?
我使用cat inputfile | grep 'pattern' > outputfile
作为命令。
每个块中的行数以及START
和ID: n1
,ID: n1
和END
之间的行数可以变化,因此使用头/尾不是一个可行的选择。此外,我想在ID匹配时从START到END打印整个块。
编辑:我尝试使用Online Regex Creator,它可以成功匹配正则表达式
START[\s\S][^END]*ID: n1[\s\S][^END]*END
在我的输入文件上。
答案 0 :(得分:1)
awk
,使用两个连续的换行符作为记录分隔符:
awk -v RS='\n\n' '/ID: n1/' file.txt
将n1
替换为n2
,n3
替换为其他人。
示例:强>
$ cat file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END
START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END
START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END
$ awk -v RS='\n\n' '/ID: n1/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END
$ awk -v RS='\n\n' '/ID: n2/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END
$ awk -v RS='\n\n' '/ID: n3/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END
答案 1 :(得分:1)
GNU awk
或 Mawk 解决方案,可以在配对{{}之间处理任意数量的行,包括空行1}}和START
次出现:
END
此解决方案使用多字符awk -v id='n2' -v RS='(^|\n)START |\nEND' '
$0 ~ ("\nID: " id " ") { print "START " $0 "\nEND" }
' file
值(也是正则表达式),POSIX spec不支持该值。 GNU RS
和Mawk(Ubuntu上的默认awk
)都支持这样的值,而BSD / macOS awk
则不支持。
awk
将ID值-v id='n2'
作为变量n2
传递给Awk。
id
通过输入/行开头的标记RS='(^|\n)START |\nEND'
和换行符后的标记START
之间的(换行)文本将输入分为多个记录。
END
将每个输入记录($0 ~ ("\nID: " id " ")
)与匹配指定ID的正则表达式($0
)匹配:换行后跟~
,后跟感兴趣的ID值(存储在变量ID:
中)和空格
注意Awk中的字符串连接是如何工作的,只需将字符串/变量引用放在一起即可。
如果匹配,id
打印手头的输入记录,由print "START " $0 "\nEND"
和START
令牌(作为输入记录分隔符,不是&#)预订39; t报告为END
)的一部分。
如果配对$0
和START
次出现之间的行都是非空 (即,至少包含1个字符,即使该字符是空格或制表符,也是符合POSIX标准的END
解决方案:
awk
请注意,awk -v id='n2' -v RS= '$0 ~ ("\nID: " id " ")' file
,即将输入记录分隔符(-v RS=
)设置为空字符串,是一个RS
成语,它通过段落将输入分成记录< / em>(非空行的运行)。
答案 2 :(得分:0)
在awk
中,您可以在起始模式和结束模式之间累积文本,然后为匹配测试该缓冲区:
cat inputfile | awk '/^START/ { buf=$0 "\n"; flag=1; next }
flag { buf=buf $0 "\n" }
/^END/ && flag { flag=0; if (buf ~ /ID: n1 |ID: n2 /) print buf }'
在Perl中你可以这样做:
cat inputfile | perl -0777 -lne 'while (/(^START.*?^ID: (n\d+) .*?^END)/gms){
if ($2 eq "n1" || $2 eq "n2"){
print "$1\n\n";
}
}'
在任何一种情况下,您可能希望awk '{script}' inputfile
或perl '{script}' inputfile
而不是使用cat