我看了this question的初学者,但我不确定我是否需要一个哈希表来存储中间结果。如果这么好,但我是Perl的新手,那么不确定。
似乎这必须在循环中完成,将每个结果存储在标量中然后应用,然后移动到下一行。但我又是新手。
扫描图案的线条。在这种情况下,HTML。是的,我知道HTML和正则表达式,但没有正则表达式,我如何从搜索模式动态构建字符串?
如果模式匹配,请使用已形成的字符串A来获取新的字符串形式B.
再次扫描行并将B替换为A.
换句话说:
$stringA = 'alias="@[found by $pattern]"'
$stringB = 'alias="@[prepended string] . [found by $pattern] . [appended string]"'
到目前为止我所拥有的:
my $pattern = 'alias="@(.*?)"';
my %seen = (); # ?
sub read_file {
my ($file) = @_;
open FILE, '<:encoding(UTF-8)', $file or die "Could not open '$file' for reading $!";
local $/ = undef;
while ( my $line = <FILE> ) {
if ( $line =~ /($pattern)/ ) {
$seen{$1}; # store results
return $line;
}
}
close FILE;
}
use Data::Dumper;
say Dumper( \%seen );
答案 0 :(得分:1)
我想你想要
$line =~ s/($pattern)/ transform($1) /eg;
其中transform($1)
是从A($1
)派生B的代码。
对于非正则表达式解决方案,XPath可以用作使用比正则表达式模式更简单的语言来识别HTML节点的方法。
my $xpath = '//@alias[starts-with(., "@")]';
my $doc = XML::LibXML->new->parse_html_file($qfn);
for my $node ($doc->findnodes($xpath)) {
transform($node);
}
$doc->toFile($qfn);
答案 1 :(得分:1)
代码中有几条评论。样品输出如下。 不确定这是否符合您的要求,但希望其中的内容可以提供帮助。
use strict;
use warnings;
my $pattern = 'alias="@(.*?)"';
my %seen = (); # defines an empty hash
sub read_file {
my ($file) = @_;
# open using lexical filehandle
open (my $fp, '<:encoding(UTF-8)', $file)
or die "Could not open '$file' for reading $!";
local $/ = undef; # effects 'slurp mode', that is, lets you read the entire file into one scalar.
my $line = <$fp>;
close ($fp); # it's all read in, so it can be safely closed here.
# loop and use the g modifier to process every match.
# see the perlre man page for full discussion of modifiers
while ( $line =~ /($pattern)/smg ) {
$seen{$1} = 0 if (!exists ($seen{$1}));
++$seen{$1};
}
}
# There was not call to read_file. This is just a "serving suggestion:"
my $filename = $ARGV[0] || die "USAGE: $0 filename\n";
read_file ($filename);
use Data::Dumper;
print Dumper( \%seen ); # use 'print', not 'say'
我用一些样本数据运行它,如egrep输出所示:
$ egrep '<(foo|bar)' index.html
<foo alias="@foobar">it's foo!</foo>
<bar alias="@barfoo">it's bar!</bar>
结果如下:
$ perl foo.pl index.html
$VAR1 = {
'alias="@foobar"' => 1,
'alias="@barfoo"' => 1
};
$