Question

我看了this question的初学者，但我不确定我是否需要一个哈希表来存储中间结果。如果这么好，但我是Perl的新手，那么不确定。

似乎这必须在循环中完成，将每个结果存储在标量中然后应用，然后移动到下一行。但我又是新手。

扫描图案的线条。在这种情况下，HTML。是的，我知道HTML和正则表达式，但没有正则表达式，我如何从搜索模式动态构建字符串？
如果模式匹配，请使用已形成的字符串A来获取新的字符串形式B.
再次扫描行并将B替换为A.

换句话说：

$stringA = 'alias="@[found by $pattern]"'
$stringB = 'alias="@[prepended string] . [found by $pattern] . [appended string]"'

到目前为止我所拥有的：

my $pattern = 'alias="@(.*?)"';
my %seen    = ();                 # ?

sub read_file {
  my ($file) = @_;

  open FILE, '<:encoding(UTF-8)', $file or die "Could not open '$file' for reading $!";
  local $/ = undef;

  while ( my $line = <FILE> ) {
    if ( $line =~ /($pattern)/ ) {
      $seen{$1};                  # store results
      return $line;
    }
  }

  close FILE;
}

use Data::Dumper;
say Dumper( \%seen );

Answer 1

我想你想要

$line =~ s/($pattern)/ transform($1) /eg;

其中transform($1)是从A（$1）派生B的代码。

对于非正则表达式解决方案，XPath可以用作使用比正则表达式模式更简单的语言来识别HTML节点的方法。

my $xpath = '//@alias[starts-with(., "@")]';

my $doc = XML::LibXML->new->parse_html_file($qfn);

for my $node ($doc->findnodes($xpath)) {
   transform($node);
}

$doc->toFile($qfn);

Answer 2

代码中有几条评论。样品输出如下。不确定这是否符合您的要求，但希望其中的内容可以提供帮助。

use strict;
use warnings;

my $pattern = 'alias="@(.*?)"';
my %seen    = (); # defines an empty hash

sub read_file {
    my ($file) = @_;

    # open using lexical filehandle
    open (my $fp, '<:encoding(UTF-8)', $file)
      or die "Could not open '$file' for reading $!";

    local $/ = undef; # effects 'slurp mode', that is, lets you read the entire file into one scalar.

    my $line = <$fp>;

    close ($fp); # it's all read in, so it can be safely closed here.

    # loop and use the g modifier to process every match.  
    # see the perlre man page for full discussion of modifiers
    while ( $line =~ /($pattern)/smg ) {
        $seen{$1} = 0 if (!exists ($seen{$1}));
        ++$seen{$1};
    }
}

# There was not call to read_file.  This is just a "serving suggestion:"
my $filename = $ARGV[0] || die "USAGE: $0 filename\n";
read_file ($filename);

use Data::Dumper;
print Dumper( \%seen );   # use 'print', not 'say'

我用一些样本数据运行它，如egrep输出所示：

$ egrep '<(foo|bar)' index.html 
<foo alias="@foobar">it's foo!</foo>
<bar alias="@barfoo">it's bar!</bar>

结果如下：

$ perl foo.pl index.html 
$VAR1 = {
          'alias="@foobar"' => 1,
          'alias="@barfoo"' => 1
        };
$

如何存储正则表达式结果Perl用于构建替换字符串？

2 个答案: