perl正则表达式替换

时间:2014-02-09 20:51:12

标签: html perl pattern-matching

在html文件中,我希望在这些模式中添加[]中的数字的链接:[1]或[1-2]或[1,3,6]或[1,3,4- 6,9]。我知道如何匹配第一个简单的那些:

$html =~ s`\[\d+\]`\[<a href="#$1"><$1>\]`g; 

其他模式怎么样?

另外,对于这种模式:[1],我想只查找一个介于[和]之间的数字,忽略模式匹配和替换中的任何其他内容,因为在我的html代码中,[]中的数字可能有不同的html代码与它们相关联。

谢谢!

2 个答案:

答案 0 :(得分:3)

已更新

算法:

use Data::Dumper;

# input
my $html = "[1],[1-2],[1,3,6],[1,3,4-6],[5]";

# result array
my @result;

# recursive way to generate html link tag
sub asHtmlLink($) {
    return "<a href=\"\#$_[0]\">$_[0]</a>";
}

# seperate by ,
foreach (split(',',$html)) {

    # match pattern
    m/(\d+)(\-)?(\d+)?/g;

    # if there is a digit after a minus
    if (defined $3) { 
        push(@result,asHtmlLink($1).$2.asHtmlLink($3));
    } else {
        push(@result,asHtmlLink($1));
    }
}

# dump
print Dumper @result;

结果:

$VAR1 = '<a href="#1">1</a>';
$VAR2 = '<a href="#1">1</a>-<a href="#2">2</a>';
$VAR3 = '<a href="#1">1</a>';
$VAR4 = '<a href="#3">3</a>';
$VAR5 = '<a href="#6">6</a>';
$VAR6 = '<a href="#1">1</a>';
$VAR7 = '<a href="#3">3</a>';
$VAR8 = '<a href="#4">4</a>-<a href="#6">6</a>';
$VAR9 = '<a href="#5">5</a>';

正则表达式链接:Regex101

答案 1 :(得分:0)

use strict;
use Carp::Assert 'assert';

sub replace {
  my $html = shift;

  # first expand subexpressions of the for "n-m"; (will fail if
  # n > m !)
  1 while $html =~ s{
                     (\[\s*(?:\s*\d\s*,\s*)*)     # prefix (1)
                     (\d+)\s*-\s*(\d+)            # target (2, 3)
                     (?=[^\]]*\])                 # suffix (lookahead)
                    }
                    {assert($2 <= $3),
                     $1 . join(", ", $2..$3)}ex;

  # now replace individual integers with hyperlinks
  1 while $html =~ s{
                     (\[\s*(?:\s*\d+\s*,\s*)*)    # prefix (1)
                     (\d+)                        # target (2)
                     (?=[^\]]*\])                 # suffix (lookahead)
                    }
                    {$1\n<a href="#$2">$2</a>}x;  # \n added to make
                                                  # output easier to
                                                  # to read; it's safe
                                                  # to omit it


  return $html;
}

一些非正式测试:

for my $h ( '[1, 3-7,  10-11 ]',
            '[  0]',
            '[1  ]',
            '[ ]',
            '[ 9-19]',
            '[9-9]',
            '[19-9]',
            '[19-z]'
          ) {
  my $r = eval { replace($h); };
  if ( $@ ) {
    $r = $@;
  }
  printf "%s\n%s\n\n", $h, $r;
}

输出:

[1, 3-7,  10-11 ]
[
<a href="#1">1</a>,
<a href="#3">3</a>,
<a href="#4">4</a>,
<a href="#5">5</a>,
<a href="#6">6</a>,
<a href="#7">7</a>,
<a href="#10">10</a>,
<a href="#11">11</a> ]

[  0]
[
<a href="#0">0</a>]

[1  ]
[
<a href="#1">1</a>  ]

[ ]
[ ]

[ 9-19]
[
<a href="#9">9</a>,
<a href="#10">10</a>,
<a href="#11">11</a>,
<a href="#12">12</a>,
<a href="#13">13</a>,
<a href="#14">14</a>,
<a href="#15">15</a>,
<a href="#16">16</a>,
<a href="#17">17</a>,
<a href="#18">18</a>,
<a href="#19">19</a>]

[9-9]
[
<a href="#9">9</a>]