如何存储正则表达式结果Perl用于构建替换字符串?

时间:2015-03-05 18:23:29

标签: regex perl hashtable

我看了this question的初学者,但我不确定我是否需要一个哈希表来存储中间结果。如果这么好,但我是Perl的新手,那么不确定。

似乎这必须在循环中完成,将每个结果存储在标量中然后应用,然后移动到下一行。但我又是新手。

  1. 扫描图案的线条。在这种情况下,HTML。是的,我知道HTML和正则表达式,但没有正则表达式,我如何从搜索模式动态构建字符串?

  2. 如果模式匹配,请使用已形成的字符串A来获取新的字符串形式B.

  3. 再次扫描行并将B替换为A.

  4. 换句话说:

    $stringA = 'alias="@[found by $pattern]"'
    $stringB = 'alias="@[prepended string] . [found by $pattern] . [appended string]"' 
    

    到目前为止我所拥有的:

    my $pattern = 'alias="@(.*?)"';
    my %seen    = ();                 # ?
    
    sub read_file {
      my ($file) = @_;
    
      open FILE, '<:encoding(UTF-8)', $file or die "Could not open '$file' for reading $!";
      local $/ = undef;
    
      while ( my $line = <FILE> ) {
        if ( $line =~ /($pattern)/ ) {
          $seen{$1};                  # store results
          return $line;
        }
      }
    
      close FILE;
    }
    
    use Data::Dumper;
    say Dumper( \%seen );
    

2 个答案:

答案 0 :(得分:1)

我想你想要

$line =~ s/($pattern)/ transform($1) /eg;

其中transform($1)是从A($1)派生B的代码。


对于非正则表达式解决方案,XPath可以用作使用比正则表达式模式更简单的语言来识别HTML节点的方法。

my $xpath = '//@alias[starts-with(., "@")]';

my $doc = XML::LibXML->new->parse_html_file($qfn);

for my $node ($doc->findnodes($xpath)) {
   transform($node);
}

$doc->toFile($qfn);

答案 1 :(得分:1)

代码中有几条评论。样品输出如下。 不确定这是否符合您的要求,但希望其中的内容可以提供帮助。

use strict;
use warnings;

my $pattern = 'alias="@(.*?)"';
my %seen    = (); # defines an empty hash

sub read_file {
    my ($file) = @_;

    # open using lexical filehandle
    open (my $fp, '<:encoding(UTF-8)', $file)
      or die "Could not open '$file' for reading $!";

    local $/ = undef; # effects 'slurp mode', that is, lets you read the entire file into one scalar.

    my $line = <$fp>;

    close ($fp); # it's all read in, so it can be safely closed here.

    # loop and use the g modifier to process every match.  
    # see the perlre man page for full discussion of modifiers
    while ( $line =~ /($pattern)/smg ) {
        $seen{$1} = 0 if (!exists ($seen{$1}));
        ++$seen{$1};
    }
}

# There was not call to read_file.  This is just a "serving suggestion:"
my $filename = $ARGV[0] || die "USAGE: $0 filename\n";
read_file ($filename);

use Data::Dumper;
print Dumper( \%seen );   # use 'print', not 'say'

我用一些样本数据运行它,如egrep输出所示:

$ egrep '<(foo|bar)' index.html 
<foo alias="@foobar">it's foo!</foo>
<bar alias="@barfoo">it's bar!</bar>

结果如下:

$ perl foo.pl index.html 
$VAR1 = {
          'alias="@foobar"' => 1,
          'alias="@barfoo"' => 1
        };
$