Question

我希望在shell脚本中跨多行匹配模式。我的输入是：

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n1 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n2 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

我试图仅使用正则表达式为特定ID显示输出（例如，n1或n2）。我尝试了START(.|\n)*ID: n1(.|\n)*END正则表达式，但它也获取了ID：n2的数据。我应该对正则表达式进行哪些更改才能获取特定ID的数据？

我使用cat inputfile | grep 'pattern' > outputfile作为命令。

每个块中的行数以及START和ID: n1，ID: n1和END之间的行数可以变化，因此使用头/尾不是一个可行的选择。此外，我想在ID匹配时从START到END打印整个块。

编辑：我尝试使用Online Regex Creator，它可以成功匹配正则表达式

START[\s\S][^END]*ID: n1[\s\S][^END]*END

在我的输入文件上。

Answer 1

在段落模式下

awk，使用两个连续的换行符作为记录分隔符：

awk -v RS='\n\n' '/ID: n1/' file.txt

将n1替换为n2，n3替换为其他人。

示例：

$ cat file.txt START <some data including white spaces> <some data including white spaces> ID: n1 <some data including white spaces> <some data including white spaces> END START <some data including white spaces> <some data including white spaces> ID: n2 <some data including white spaces> <some data including white spaces> END START <some data including white spaces> <some data including white spaces> ID: n3 <some data including white spaces> <some data including white spaces> END $ awk -v RS='\n\n' '/ID: n1/' file.txt START <some data including white spaces> <some data including white spaces> ID: n1 <some data including white spaces> <some data including white spaces> END $ awk -v RS='\n\n' '/ID: n2/' file.txt START <some data including white spaces> <some data including white spaces> ID: n2 <some data including white spaces> <some data including white spaces> END $ awk -v RS='\n\n' '/ID: n3/' file.txt START <some data including white spaces> <some data including white spaces> ID: n3 <some data including white spaces> <some data including white spaces> END

Answer 2

GNU awk或 Mawk 解决方案，可以在配对{{}之间处理任意数量的行，包括空行1}}和START次出现：

END

^{此解决方案使用多字符awk -v id='n2' -v RS='(^|\n)START |\nEND' '
$0 ~ ("\nID: " id " ") { print "START " $0 "\nEND" }
' file
值（也是正则表达式），POSIX spec不支持该值。 GNU RS和Mawk（Ubuntu上的默认awk）都支持这样的值，而BSD / macOS awk则不支持。}

awk将ID值-v id='n2'作为变量n2传递给Awk。
id通过输入/行开头的标记RS='(^|\n)START |\nEND'和换行符后的标记START 之间的（换行）文本将输入分为多个记录。
END将每个输入记录（$0 ~ ("\nID: " id " ")）与匹配指定ID的正则表达式（$0）匹配：换行后跟~，后跟感兴趣的ID值（存储在变量ID: 中）和空格注意Awk中的字符串连接是如何工作的，只需将字符串/变量引用放在一起即可。
如果匹配，id打印手头的输入记录，由print "START " $0 "\nEND"和START令牌（作为输入记录分隔符，不是＆＃）预订39; t报告为END）的一部分。

如果配对$0和START次出现之间的行都是非空 （即，至少包含1个字符，即使该字符是空格或制表符，也是符合POSIX标准的END 解决方案：

awk

请注意，awk -v id='n2' -v RS= '$0 ~ ("\nID: " id " ")' file，即将输入记录分隔符（-v RS=）设置为空字符串，是一个RS成语，它通过段落将输入分成记录< / em>（非空行的运行）。

Answer 3

在awk中，您可以在起始模式和结束模式之间累积文本，然后为匹配测试该缓冲区：

cat inputfile | awk  '/^START/        { buf=$0 "\n"; flag=1; next } 
                      flag            { buf=buf $0 "\n" } 
                      /^END/ && flag  { flag=0; if (buf ~ /ID: n1 |ID: n2 /) print buf }'

在Perl中你可以这样做：

cat inputfile | perl -0777 -lne 'while (/(^START.*?^ID: (n\d+) .*?^END)/gms){
    if ($2 eq "n1" || $2 eq "n2"){
        print "$1\n\n";
    }
}'

在任何一种情况下，您可能希望awk '{script}' inputfile或perl '{script}' inputfile而不是使用cat

在shell脚本中跨多行匹配表达式

3 个答案: