我正在尝试识别代码所在的列表数据:
my $listdata = '
List Items:
(1)LIST 1 data
(a)sub data
(b)sub data
(c)sub data
(d)sub data
(i)sub-sub data
(ii)sub-sub data
(A)sub-sub-sub data
(B)sub-sub-sub data
(iii)sub-sub data
(e)sub data
(2)LIST 2 data
(3)LIST 3 data
';
#print "\n\n\n$listdata\n\n";
###Array of multi-level patterns
my @level_check =('\(\d+\)','(?<!\()\d+\)','\([a-h]\)','(?<!\()[a-h]\)','\([A-H]\)','(?<!\()[A-H]\)','\d+\.',
'\([IVX]+\)','(?<!\()[IVX]+\)','\([ivx]+\)','(?<!\()[ivx]+\)','\-');
###pattern for each levels
my ($first_level,$second_level,$third_level,$fourth_level);
###First from each pattern
my ($first_occur,$second_occur,$third_occur,$fourth_occur);
#++++++++++++++++++++++++Pattern for multilevel list+++++++++++++++++++++++#
my $pattern = '((?:[IVX\-\(\)\d\.\-][a-z]?\)?)+)';
$listdata =~ s{$pattern}{
my ($leveltemp) = ($1);
$first_occur = $leveltemp if !$first_occur;
#print "$data";
#print "all_level: $leveltemp##\n";
#########First Level Start
for($i=0; $i<scalar(@level_check);$i++){
if($first_occur =~ /^$level_check[$i]$/){
$first_level = $level_check[$i] if !$first_level;
#print "$level_check[$i] =>is Ist: $first_level\n";
}
}
for($i=0; $i<scalar(@level_check);$i++){
if($leveltemp =~ /^$first_level$/){
$leveltemp =~ s{$pattern}{<<LIST1>>$2$3};
#print"**$data level matched: $leveltemp => $first_level\n";
############First Level End
}
else
{
######Second level Start
if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([a-h]{3,})/i){
$second_occur = $leveltemp if !$second_occur;
#print "$leveltemp :$second_occur\n";
for($i=0; $i<scalar(@level_check);$i++){
if($second_occur =~ /^$level_check[$i]$/){
$second_level = $level_check[$i] if !$second_level;
#print "$leveltemp =>is IInd: $second_level\n";
}
}
if($leveltemp =~ /^$second_level/){
$leveltemp =~ s{$pattern}{<<LIST2>>$2$3};
#print"**level matched: $leveltemp => $seconf_level\n";
######Second level End
}
else
{
########Third Level Start
if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([A-h]{3,})/i){
$third_occur = $leveltemp if !$third_occur;
for($i=0; $i<scalar(@level_check);$i++){
if($third_occur =~ /^$level_check[$i]$/){
$third_level = $level_check[$i] if !$third_level;
#print "$leveltemp =>is IIIrd: $third_level\n";
}
}
if($leveltemp =~ /^$third_level/){
$leveltemp =~ s{$pattern}{<<LIST3>>$2$3};
#print"**level matched: $leveltemp => $third_level\n";
#########Third Level End
}
else
{
########Fourth Level Start
if($leveltemp !~ /^(?:<<LIST+>>|\d{3,}\,?|\([A-z]{3,})/i){
$fourth_occur = $leveltemp if !$fourth_occur;
#print "$leveltemp :$fourth_occur\n";
for($i=0; $i<scalar(@level_check);$i++){
if($fourth_occur =~ /^$level_check[$i]$/){
$fourth_level = $level_check[$i] if !$fourth_level;
#print "$leveltemp =>is IVrth: $fourth_level\n";
}
}
if($leveltemp =~ /^$fourth_level/){
$leveltemp =~ s{$pattern}{<<LIST4>>$2$3};
#print"**$fourth_occur level matched: $leveltemp => $fourth_level\n";
#########Fourth Level End
}
#######Add Next Levels Here If Any in else loop
}
}#IV lvl else loop end
}
}#III lvl else loop end
}
}#IInd lvl else loop end
}#Ist lvl for loop end
"$leveltemp"
}gsixe;
print "$listdata\n";
需要输出:
<<LIST1>>(1)LIST 1 data
<<LIST2>>(a)sub data
<<LIST2>>(b)sub data
<<LIST2>>(c)sub data
<<LIST2>>(d)sub data
<<LIST3>>(i)sub-sub data
<<LIST3>>(ii)sub-sub data
<<LIST4>>(A)sub-sub-sub data
<<LIST4>>(B)sub-sub-sub data
<<LIST3>>(iii)sub-sub data
<<LIST2>>(e)sub data
<<LIST1>>(2)LIST 2 data
<<LIST1>>(3)LIST 3 data
问题是我必须为每个级别输入代码。我在这里编码了四个级别。但这不是解决方案(List可能有任意数量的子级别)。 有没有其他方法可以为此编写短代码,涵盖列表中所有可能的子级别。列表再次是动态的。 列表可以以下列任何一种格式开始= A)(A)1。1)(1)a)(a)i)(i)。
答案 0 :(得分:1)
使用堆栈跟踪“打开”样式,以确定新样式是儿童还是父母。
use strict;
use warnings;
my @styles = (
'\(\d+\)', '\d+\)', '\d+\.',
'\([a-h]\)', '[a-h]\)', '\([A-H]\)', '[A-H]\)',
'\([IVX]+\)', '[IVX]+\)', '\([ivx]+\)', '[ivx]+\)',
'-',
);
my @stack;
while (<>) {
for my $i (reverse 0..$#stack) {
if (/$stack[$i]/) {
splice(@stack, $i+1);
goto DONE_LINE;
}
}
for my $style (@styles) {
if (my ($spaces) = /^( *)$style/) {
push @stack, qr/^$spaces$style/;
goto DONE_LINE;
}
}
die "Unrecognized format at line $. - $_";
DONE_LINE:
s/^ *//;
printf("<<LIST%d>>%s", 0+@stack, $_);
}
为避免反复重新编译相同的正则表达式,请添加
my %re_cache = map { $_ => qr/^( *)$_/ } @styles;
并更改
/^( *)$style/
到
/$re_cache{$style}/
答案 1 :(得分:0)
尝试逐行处理。以下标识每个列表项所在的级别。人们只需要跟踪前一级别以确定某些东西是否为子级,以及每个先前级别的最大值以验证事物的顺序是否正确:
use strict;
use warnings;
###Array of multi-level patterns
my @level_check = (
'\(\d+\)',
'(?<!\()\d+\)',
'\([a-h]\)',
'(?<!\()[a-h]\)',
'\([A-H]\)',
'(?<!\()[A-H]\)',
'\d+\.',
'\([IVX]+\)',
'(?<!\()[IVX]+\)',
'\([ivx]+\)',
'(?<!\()[ivx]+\)',
'\-',
);
while (<DATA>) {
chomp(my $line = $_);
my $match = 0;
for my $i (0..$#level_check) {
if ($line =~ /^\s*$level_check[$i]/) {
$match = $i + 1;
last;
}
}
if ($match) {
print "Level $match - $line\n";
} else {
print "No Match - $line\n";
}
}
1;
__END__
(1)LIST 1 data
(a)sub data
(b)sub data
(c)sub data
(d)sub data
(i)sub-sub data
(ii)sub-sub data
(A)sub-sub-sub data
(B)sub-sub-sub data
(iii)sub-sub data
(e)sub data
(2)LIST 2 data
(3)LIST 3 data
打印
Level 1 - (1)LIST 1 data
Level 3 - (a)sub data
Level 3 - (b)sub data
Level 3 - (c)sub data
Level 3 - (d)sub data
Level 10 - (i)sub-sub data
Level 10 - (ii)sub-sub data
Level 5 - (A)sub-sub-sub data
Level 5 - (B)sub-sub-sub data
Level 10 - (iii)sub-sub data
Level 3 - (e)sub data
Level 1 - (2)LIST 2 data
Level 1 - (3)LIST 3 data