Question

我想找到一些字符的位置，以便在不使用怪异的递归和低效正则表达式的情况下处理它们。我是这样做的：

my @charpos=();
s/(?=([«»\n]))/push @charpos, [$1, 0+($-[0])]; "";/ge;
# sort {$a->[1] <=> $b->[1]} @charpos;

但是这个解决方案使用«substitute»运算符替换空字符串，这是正常的吗？注释行是否应该取消注释？

Answer 1

通常，要查找字符串中字符的位置，您可以这样做：

my $str = ...;
my @pos;
push @pos, pos $str while $str =~ /(?=[...])/g;

然后正则表达式匹配的所有位置都在@pos中。至少使用此方法，您不会经常重写源字符串。

Answer 2

对于您的一般问题，您可能需要检查sub parse_line中的Text::ParseWords。

在您在问题中提供的代码的上下文中，我将避免修改源字符串：

#!/usr/bin/perl

use utf8;
use strict; use warnings;

my $x = q{«...«...»...«...»...»};

my @pos;

while ( $x =~ /([«»\n])/g ) {
    push @pos, $-[1];
}

use YAML;
print Dump \@pos;

Answer 3

皮肤猫的方法不止一种：

#!/usr/bin/env perl

use 5.010;
use utf8;
use strict;
use warnings qw< FATAL all >;
use autodie;
use open qw< :std OUT :utf8 >;

END { close STDOUT }

my @pos = ();
my $string = q{«...«...»...«...»...»};
($string .= "\n") x= 3;

say "string is:\n$string";

for ($string) {
    push @pos, pos while m{
        (?= [«»\n] )
    }sxg;;
}
say "first  test matches \@ @pos";

@pos = ();

## this smokes :)
"ignify" while $string =~ m{
    [«»\n]
    (?{ push @pos, $-[0] })
}gx;
say "second test matches \@ @pos";

__END__
string is:
«...«...»...«...»...»
«...«...»...«...»...»
«...«...»...«...»...»

first  test matches @ 0 4 8 12 16 20 21 22 26 30 34 38 42 43 44 48 52 56 60 64 65
second test matches @ 0 4 8 12 16 20 21 22 26 30 34 38 42 43 44 48 52 56 60 64 65

但请注意Sinan。

Answer 4

添加到手册中的正则表达式免费猫皮肤化。是否是怪物是在旁观者眼中：

use List::Util q/min/;
my @targets = ('«','»',"\n");
my $x = q{«...«...»...«...»...»};
my $pos = min map { my $z = index($x,$_); $z<0?Inf:$z } @targets;
my @pos;
while ($pos < Inf) {
    push @pos, $pos;
    $pos = min map { my $z = index($x,$_,$pos+1); $z<0?Inf:$z } @targets;
}

找到给定字符的位置

4 个答案: