如何使用perl在html中按顺序更改索引页码

时间:2015-07-09 08:38:11

标签: perl

索引页面显示在手稿

  

索引

     

ab ovo 5,98

     

Abweichung 56,78,1,38f。,5

     

Akrostichon 1,789f。,24,3f。,45,985,788,45f。,125,10,121,128,   413

     

Allegorese 451,892,333,454,155

     

Allegorie 451,782f。,311,344-354,788f。,58,8,19,110,115,12

我使用以下脚本进行链接

opendir( DIR, "." ) || die "Cant open Input Dir";
@input = readdir(DIR);
my %link;
my $text2;
@inp = grep ( /html$/i, @input );


foreach $file (@inp) {

    $file =~ s/.html//g;
    print "\n$file";
    open( MIN, "$file\.html" ) || die "Cant open $fname merging file";
    my $text;
    { local $/; $text = <MIN>; }

    while ( $text =~ m/ id="y([^"]+)"/gs ) {
        $link{$1} = $file;
    }
}
close(DIR);
opendir( DIR1, "." ) || die "Cant open Input Dir";
@input1 = readdir(DIR1);
@inp1 = grep ( /html$/i, @input1 );
mkdir( Final, 0777 );
foreach $file1 (@inp1) {
    $file1 =~ s/.html//g;
    open( IN, "$file1\.html" ) || die "Cant open $fname merging file";
    open( OUT, ">Final\\$file1\.html" )
        || die "Cant open $fname merging file";
    my $text1;
    { local $/; $text1 = <IN>; }
    print "\n$file1";


    $text1 =~ s/([0-9]+)([A-z]+)/<a href="#page_$1">$1<\/a>$2/g;
    $text1 =~ s/([0-9]+)/<a href="#page_$1">$1<\/a>/g;
    $text1
        =~ s/([0-9]+)&#x2013;([0-9]+)/<a href="#page_$1">$1<\/a>&#x2013;<a href="#page_$2">$2<\/a>/g;


    print OUT $text1;

输出结果为:

<p class="primary">Allegorie <a href="#page_451">451</a>, <a href="#page_782">782</a>f., <a href="#page_311">311</a>, <a href="#page_344">344</a>&#x2013;<a href="#page_354">354</a>, <a href="#page_788">788</a>f., <a href="#page_58">58</a>, <a href="#page_8">8</a>, <a href="#page_19">19</a>, <a href="#page_110">110</a>, <a href="#page_115">115</a>, <a href="#page_12">12</a></p>

但客户要求更改页码如下。

  

索引

     

ab ovo 1,2

     

Abweichung 1,2,3,4f,5

     

Akrostichon 1,2f。,3,4f。,5,6,7,8f。,9,10,11,12,13

     

Allegorese 1,2,3,4,5

     

Allegorie 1,2f。,3,4,5,6,7,7,8,9,10,11,12

输出应为:

<p class="primary">Allegorie <a href="#page_451">1</a>, <a href="#page_782">2</a>f., <a href="#page_311">3</a>, <a href="#page_344">4</a>&#x2013;<a href="#page_354">5</a>, <a href="#page_788">6</a>f., <a href="#page_58">7</a>, <a href="#page_8">8</a>, <a href="#page_19">9</a>, <a href="#page_110">10</a>, <a href="#page_115">11</a>, <a href="#page_12">12</a></p>

如何使用perl更改索引页码?

有人请帮我解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

好的,除了整理你的代码之外 - 看起来你的核心问题是:

<p class="primary">Allegorie <a href="#page_451">1</a>

VS

<p class="primary">Allegorie <a href="#page_451">451</a>, 

我建议您可以这样做的方法是更改​​正则表达式:

    $text1 =~ s/([0-9]+)([A-z]+)/<a href="#page_$1">$1<\/a>$2/g;
    $text1 =~ s/([0-9]+)/<a href="#page_$1">$1<\/a>/g;
    $text1  =~ s/([0-9]+)&#x2013;([0-9]+)/<a href="#page_$1">$1<\/a>&#x2013;<a href="#page_$2">$2<\/a>/g;

添加计数变量:

my $count = 1; 

然后将$count++添加到正则表达式而不是$1$2

虽然我会指出可能回溯一点并使用正确的HTML解析器是一种更好的方法 - HTML不能很好地解析正则表达式。

另外:请开启use strict;use warnings;。它们是避免perl编程中一些更糟糕的陷阱的好方法。