需要用封闭的分隔符包围仍未包含的字符串。示例文本:
Some text or random characters here. {% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}Another
bla-bla
random text outside of the delimiters - called
as "free text".
需要用
附上自由文本的所有内容%{ORIG .... original free text ... %}
并且不要修改已包含的字符串。 因此,在上面的例子中需要包含两部分自由文本,并且应该得到下一部分:
{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla
random text outside of the delimiters - called
as "free text".%}
因此,开场定界符为{%
,结束时为%}
。
问题:
答案 0 :(得分:6)
您可以在recursive subpattern calls like (?R)
的帮助下使用正则表达式执行此操作。
例如:
$_ = <<'_STR_';
Some text or random characters here. {% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}Another
bla-bla
random text outside of the delimiters - called
as "free text".
_STR_
s/
( {% (?R)* %} ) # match balanced {% %} groups
|
( (?: (?! {% | %} ) . )+ ) # match everything except {% %}
/
$1 ? $1 : "{%ORIG $2 %}"; # if {% ... %} matched, leave it as is. else enclose it
/gsex;
print;
输出:
{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla
random text outside of the delimiters - called
as "free text".
%}
答案 1 :(得分:5)
Jonathan Leffler's建议是对的。您可以使用Text::Balanced
模块及其extract_tagged
函数来解决此问题:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::Balanced qw<extract_tagged>;
my ($open_delim, $close_delim) = qw( {% %} );
my $text = do { local $/ = undef; <> };
chomp $text;
while (1) {
my @r = extract_tagged($text, $open_delim, $close_delim, '(?s).*?(?={%)', undef);
if (length $r[2]) {
printf qq|%sORIG %s%s|, $open_delim, $r[2], $close_delim;
}
if (length $r[0]) {
printf qq|%s|, $r[0];
}
else {
if (length $r[1]) {
printf qq|%sORIG %s%s|, $open_delim, $r[1], $close_delim;
}
last;
}
$text = $r[1];
}
该程序执行无限循环,直到文本中没有更多分隔符。在那之前,在每次迭代中,它检查前缀(文本直到开始分隔符$r[2]
)并用分隔符围绕它,对于已经用它们包围的文本($r[0]
),将其打印为是
一开始我会啜饮整个文件的内容,因为此函数仅适用于标量。您应该查看文档以了解函数返回的内容,并且我希望您能够获得有助于解决问题的想法,以防它比此示例复杂得多。
只是为了测试,运行它:
perl script.pl infile
产量:
{%ORIG Some text or random characters here. %}{% Another random string
enclosed in a pair of delimiters as next {% what can be deeply
nested {% as {%here%}%} end of delimited %} text. %}{%ORIG Another
bla-bla
random text outside of the delimiters - called
as "free text".%}