识别动态列表项

时间:2013-12-23 06:55:44

标签: perl list search identify

我正在尝试识别代码所在的列表数据:

my $listdata = '
List Items:     
(1)LIST 1 data 
(a)sub data
(b)sub data
(c)sub data
(d)sub data
    (i)sub-sub data
    (ii)sub-sub data
        (A)sub-sub-sub data
        (B)sub-sub-sub data
    (iii)sub-sub data
(e)sub data
(2)LIST 2 data 
(3)LIST 3 data 
';

    #print "\n\n\n$listdata\n\n";

    ###Array of multi-level patterns 
    my @level_check =('\(\d+\)','(?<!\()\d+\)','\([a-h]\)','(?<!\()[a-h]\)','\([A-H]\)','(?<!\()[A-H]\)','\d+\.',
                      '\([IVX]+\)','(?<!\()[IVX]+\)','\([ivx]+\)','(?<!\()[ivx]+\)','\-');

    ###pattern for each levels
    my ($first_level,$second_level,$third_level,$fourth_level);

    ###First from each pattern
    my ($first_occur,$second_occur,$third_occur,$fourth_occur);

    #++++++++++++++++++++++++Pattern for multilevel list+++++++++++++++++++++++#
    my $pattern = '((?:[IVX\-\(\)\d\.\-][a-z]?\)?)+)';

    $listdata =~ s{$pattern}{
        my ($leveltemp) = ($1);
        $first_occur = $leveltemp if !$first_occur;

        #print "$data";
        #print "all_level: $leveltemp##\n";

        #########First Level Start
        for($i=0; $i<scalar(@level_check);$i++){
            if($first_occur =~ /^$level_check[$i]$/){
                $first_level = $level_check[$i] if !$first_level;
                #print "$level_check[$i] =>is Ist: $first_level\n";
            }
        }

        for($i=0; $i<scalar(@level_check);$i++){

            if($leveltemp =~ /^$first_level$/){
                $leveltemp =~ s{$pattern}{<<LIST1>>$2$3};
                #print"**$data level matched: $leveltemp => $first_level\n";
                ############First Level End
            }
            else
            {
                ######Second level Start
                if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([a-h]{3,})/i){
                    $second_occur = $leveltemp if !$second_occur;
                    #print "$leveltemp :$second_occur\n";

                    for($i=0; $i<scalar(@level_check);$i++){
                        if($second_occur =~ /^$level_check[$i]$/){
                        $second_level = $level_check[$i] if !$second_level;
                        #print "$leveltemp =>is IInd: $second_level\n";
                        }
                    }

                    if($leveltemp =~ /^$second_level/){
                        $leveltemp =~ s{$pattern}{<<LIST2>>$2$3};
                        #print"**level matched: $leveltemp => $seconf_level\n";
                        ######Second level End
                    }
                    else
                    {
                        ########Third Level Start   
                        if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([A-h]{3,})/i){
                            $third_occur = $leveltemp if !$third_occur;

                            for($i=0; $i<scalar(@level_check);$i++){
                                if($third_occur =~ /^$level_check[$i]$/){
                                    $third_level = $level_check[$i] if !$third_level;
                                    #print "$leveltemp =>is IIIrd: $third_level\n";
                                }
                            }

                            if($leveltemp =~ /^$third_level/){
                                $leveltemp =~ s{$pattern}{<<LIST3>>$2$3};
                                #print"**level matched: $leveltemp => $third_level\n";
                            #########Third Level End
                            }
                            else
                            {
                                ########Fourth Level Start  
                                if($leveltemp !~ /^(?:<<LIST+>>|\d{3,}\,?|\([A-z]{3,})/i){

                                    $fourth_occur = $leveltemp if !$fourth_occur;
                                        #print "$leveltemp :$fourth_occur\n";
                                    for($i=0; $i<scalar(@level_check);$i++){
                                        if($fourth_occur =~ /^$level_check[$i]$/){
                                            $fourth_level = $level_check[$i] if !$fourth_level;
                                            #print "$leveltemp =>is IVrth: $fourth_level\n";
                                        }
                                    }

                                    if($leveltemp =~ /^$fourth_level/){
                                        $leveltemp =~ s{$pattern}{<<LIST4>>$2$3};
                                        #print"**$fourth_occur  level matched: $leveltemp => $fourth_level\n";
                                        #########Fourth Level End
                                    }
                                    #######Add Next Levels Here If Any in else loop


                                }
                            }#IV lvl else loop end
                        }   
                    }#III lvl else loop end
                }
            }#IInd lvl else loop end

        }#Ist lvl for loop end

        "$leveltemp"
    }gsixe;

print "$listdata\n";

需要输出:

 <<LIST1>>(1)LIST 1 data 
 <<LIST2>>(a)sub data
 <<LIST2>>(b)sub data
 <<LIST2>>(c)sub data
 <<LIST2>>(d)sub data
 <<LIST3>>(i)sub-sub data
 <<LIST3>>(ii)sub-sub data
 <<LIST4>>(A)sub-sub-sub data
 <<LIST4>>(B)sub-sub-sub data
 <<LIST3>>(iii)sub-sub data
 <<LIST2>>(e)sub data
 <<LIST1>>(2)LIST 2 data 
 <<LIST1>>(3)LIST 3 data

问题是我必须为每个级别输入代码。我在这里编码了四个级别。但这不是解决方案(List可能有任意数量的子级别)。 有没有其他方法可以为此编写短代码,涵盖列表中所有可能的子级别。列表再次是动态的。 列表可以以下列任何一种格式开始= A)(A)1。1)(1)a)(a)i)(i)。

2 个答案:

答案 0 :(得分:1)

使用堆栈跟踪“打开”样式,以确定新样式是儿童还是父母。

use strict;
use warnings;

my @styles = (
    '\(\d+\)',     '\d+\)',     '\d+\.',
    '\([a-h]\)',   '[a-h]\)',   '\([A-H]\)',   '[A-H]\)',
    '\([IVX]+\)',  '[IVX]+\)',  '\([ivx]+\)',  '[ivx]+\)',
    '-',
);

my @stack;
while (<>) {
   for my $i (reverse 0..$#stack) {
      if (/$stack[$i]/) {
         splice(@stack, $i+1);
         goto DONE_LINE;
      }
   }

   for my $style (@styles) {
      if (my ($spaces) = /^( *)$style/) {
         push @stack, qr/^$spaces$style/;
         goto DONE_LINE;
      }
   }

   die "Unrecognized format at line $. - $_";

DONE_LINE:
   s/^ *//;
   printf("<<LIST%d>>%s", 0+@stack, $_);
}

为避免反复重新编译相同的正则表达式,请添加

my %re_cache = map { $_ => qr/^( *)$_/ } @styles;

并更改

/^( *)$style/

/$re_cache{$style}/

答案 1 :(得分:0)

尝试逐行处理。以下标识每个列表项所在的级别。人们只需要跟踪前一级别以确定某些东西是否为子级,以及每个先前级别的最大值以验证事物的顺序是否正确:

use strict;
use warnings;


###Array of multi-level patterns 
my @level_check = (
    '\(\d+\)',
    '(?<!\()\d+\)',
    '\([a-h]\)',
    '(?<!\()[a-h]\)',
    '\([A-H]\)',
    '(?<!\()[A-H]\)',
    '\d+\.',
    '\([IVX]+\)',
    '(?<!\()[IVX]+\)',
    '\([ivx]+\)',
    '(?<!\()[ivx]+\)',
    '\-',
);

while (<DATA>) {
    chomp(my $line = $_);

    my $match = 0;
    for my $i (0..$#level_check) {
        if ($line =~ /^\s*$level_check[$i]/) {
            $match = $i + 1;
            last;
        }
    }

    if ($match) {
        print "Level $match - $line\n";
    } else {
        print "No Match - $line\n";
    }
}

1;

__END__
(1)LIST 1 data 
(a)sub data
(b)sub data
(c)sub data
(d)sub data
    (i)sub-sub data
    (ii)sub-sub data
        (A)sub-sub-sub data
        (B)sub-sub-sub data
    (iii)sub-sub data
(e)sub data
(2)LIST 2 data 
(3)LIST 3 data 

打印

Level 1 - (1)LIST 1 data
Level 3 - (a)sub data
Level 3 - (b)sub data
Level 3 - (c)sub data
Level 3 - (d)sub data
Level 10 -     (i)sub-sub data
Level 10 -     (ii)sub-sub data
Level 5 -         (A)sub-sub-sub data
Level 5 -         (B)sub-sub-sub data
Level 10 -     (iii)sub-sub data
Level 3 - (e)sub data
Level 1 - (2)LIST 2 data
Level 1 - (3)LIST 3 data