这是我要解析的字符串的简化示例:
$my_string = "000 AAA 111 ZZZ AAA 222 AAA 333 ZZZ ZZZ 444"
我想检索AAA
&之间的内容。 ZZZ
,但我想忽略嵌套AAA
/ ZZZ
中的内容。因此,在上面的示例中,我想要111
和222
(在此示例中它们是数字,但它们可以是任何字母数字,但AAA
或{{1}除外}),但我忽略了ZZZ
,因为它在嵌套的333
/ AAA
中。并且可以有任意数量的嵌套ZZZ
/ AAA
。例如:
ZZZ
在第二个例子中,我只想要$my_string2 = "AAA 1 AAA 2 AAA 3 AAA 4 ZZZ ZZZ ZZZ ZZZ"
。
答案 0 :(得分:4)
以下是递归解析的示例。在这种情况下,您只对此感兴趣 在1级内容。
** 添加了解析所有核心或单级核心的示例段(加快速度)。
# (?s)(?:((?&content))|AAA((?&core)|)ZZZ|((?:AAA|ZZZ)))(?(DEFINE)(?<core>(?>(?&content)|AAA(?:(?=.)(?&core)|)ZZZ)+)(?<content>(?>(?!(?:AAA|ZZZ)).)+))
# //////////////////////////////////////////////////////
# // The General Guide to 3-Part Recursive Parsing
# // ----------------------------------------------
# // Part 1. CONTENT
# // Part 2. CORE
# // Part 3. ERRORS
(?s)
(?:
( # (1), Take off CONTENT
(?&content)
)
| # OR
AAA # Start-Delimiter
( # (2), Take off The CORE
(?&core)
|
)
ZZZ # End-Delimiter
| # OR
( # (3), Take off Unbalanced (delimeter) ERRORS
(?: AAA | ZZZ )
)
)
# ///////////////////////
# // Subroutines
# // ---------------
(?(DEFINE)
# core
(?<core>
(?>
(?&content)
|
AAA
# recurse core
(?:
(?= . )
(?&core)
|
)
ZZZ
)+
)
# content
(?<content>
(?>
(?!
(?: AAA | ZZZ )
)
.
)+
)
)
Perl代码:
use strict;
use warnings;
$/ = undef;
my $content = <DATA>;
# Set the error mode on/off here ..
my $BailOnError = 1;
my $IsError = 0;
my @vals = ();
my $level = 0;
ParseCore( $content );
print "\n@vals";
exit;
sub ParseCore
{
my ($core) = @_;
while ( $core =~ /(?s)(?:((?&content))|AAA((?&core)|)ZZZ|((?:AAA|ZZZ)))(?(DEFINE)(?<core>(?>(?&content)|AAA(?:(?=.)(?&core)|)ZZZ)+)(?<content>(?>(?!(?:AAA|ZZZ)).)+))/g )
{
if (defined $1)
{
# CONTENT
if ( $level == 1 ) {
push @vals, $1;
}
}
elsif (defined $2)
{
# CORE
my $k = $2;
# To parse all core's:
# -----------------------
# ++$level;
# ParseCore( $k );
# --$level;
# To parse just level 1 core's:
# ----------------------------------
if ( $level == 0 ) {
++$level;
ParseCore( $k );
--$level;
}
if ( $BailOnError && $IsError ) {
last;
}
}
else
{
# ERRORS
print "Unbalanced '$3' at position = ", $-[0];
$IsError = 1;
# Decide to continue here ..
# If BailOnError is set, just unwind recursion.
# -------------------------------------------------
if ( $BailOnError ) {
last;
}
}
}
}
#================================================
__DATA__
000 AAA 111 ZZZ AAA 222 AAA 333 ZZZ ZZZ 444
AAA 1 AAA 2 AAA 3 AAA 4 ZZZ ZZZ ZZZ ZZZ
输出:
111 222 1