Question

我有一个包含以下格式的数百个术语的文本文件：

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name2  
xref: type1:aba  
xref: type3:fee

我需要使用类型1的外部参照提取所有术语，并将它们以相同的格式写入新文件。我打算使用这样的正则表达式：

/\[Term\](.*)type1(.*)[^\[Term\]]/g

找到相应的术语，但我不知道如何在多行上搜索正则表达式。我应该将原始文本文件读作字符串还是行？非常感谢任何帮助。

Answer 1

试试这个正则表达式：

/(?s)\[Term\].*?xref: type1.*?(?=\[Term\])/g

这个正则表达式有以下显着的变化：

(?s)启用“点匹配换行符”
.*?是非贪婪的表达式。使用.*将消耗文件

[Term]

删除.*?
添加了细微的优化以匹配 xref ，而不仅仅是type1
删除了以下术语标记
添加了一个前瞻性内容，以匹配（但不包括）下一个[Term]标记

Answer 2

另一种方法可能是使用$/变量来分割空白行中的块，因为每个块用换行符分割它，然后为每一行运行一个正则表达式。所以当其中一个匹配打印时并阅读下一个块。一个单行的例子：

perl -ne '
    BEGIN { $/ = q|| }
    my @lines = split /\n/;  
    for my $line ( @lines ) {
        if ( $line =~ m/xref:\s*type1/ ) {     
            printf qq|%s|, $_;
            last;
        }
    }
' infile

假设输入文件如：

[Term]
id: id1
name: name1
xref: type1:aab
xref: type2:cdc

[Term]
id: id2
name: name1
xref: type6:aba
xref: type3:fee

[Term]
id: id2
name: name1
xref: type1:aba
xref: type3:fee

[Term]
id: id2
name: name1
xref: type4:aba
xref: type3:fee

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee

它产生：

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee 

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee

正如您所看到的，只有那些中有xref: type1行的人才会被打印出来。

使用Perl正则表达式查找和提取多行匹配

2 个答案: