Question

我正在进行Perl任务。其中一个要求是匹配除注释或字符串中的所有整数和浮点数（双引号或单引号）。

这是我的假设：

可选符号，整数和分数。
如果省略整数部分，则必须使用分数。
如果省略小数部分，则必须省略小数点。

这是我找到的正则表达式。

([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))

这是我的代码块，我无法排除注释和字符串中的数字，因此我首先删除所有注释和字符串。我也将线条分成单词，我相信这应该更容易。但我也相信这不是必要的。

while (<$IN_FILE>) {
    s/^(#[^!]+$)//;            # remove whole line comments
    s{(^[^#]+?)(#[^/]+$)}{$1}; # remove inline comments
    s/('.*?'|".*?")//g;        # remove all single line strings
    push @words, split;        # split line into words
  }

  foreach my $item (<@words>) {
    push @numbers, $1 if $item =~ /([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))/;
  }

它工作正常但未能像0中的ARGV[0]那样匹配数组索引。

所以我需要一些帮助来改进我的代码，如果我不必删除注释，首先是字符串，不需要将行拆分为单词，当然也不需要在注释中匹配所有数字，这将是很好的和字符串。

简单输入

# Comment 1
my $time = <STDIN>;
chomp $time;
   #now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
   print "true or false";
   return 3 + 4 eq "7"; # true or false
}

以下是我的代码中的0和ARGV[0]中的11遗漏(8..11)的输出。如果它错过了更多，我不会感到惊讶。

[Numbers]
3.1415926
-3.22
+0.01
8
2
3
4

Answer 1

主要问题在于：

foreach my $item (<@words>) {

您希望迭代@words，因此不需要<>。它们变成glob，它会更改您想要迭代的列表。只需插入

即可

warn "\t$item\n"

进入最后一个循环以查看正在处理的内容。

即使在解决此问题后，(8..11)也会被标记为一个“字”。您没有任何/g匹配，因此您无法从某个项目中获得多个数字。

Answer 2

正如choroba已经指出的那样，你使用<@words>是一个明显的错误。

但是，您应该首先将行划分为单词，而不是使用/g来匹配

，从而简化操作。

use strict;
use warnings;

my @numbers;
while (<DATA>) {
    s/^(#[^!]+$)//;            # remove whole line comments
    s{(^[^#]+?)(#[^/]+$)}{$1}; # remove inline comments
    s/('.*?'|".*?")//g;        # remove all single line strings

    while (/([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))/g) {
        push @numbers, $1;
    }
}

print "@numbers";

__DATA__
# Comment 1
my $time = <STDIN>;
chomp $time;
   #now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
   print "true or false";
   return 3 + 4 eq "7"; # true or false
}

这最终会导致太多结果。一种解决方案是在正则表达式中的数字之前添加单词边界：

while (/([-+]?\b([0-9]+(\.[0-9]+)?|\.[0-9]+))\b/g) {

输出：

3.1415926 -3.22 +0.01 8 11 0 3 4

实现此目的的最佳方法是使用PPI。这绝对超出了教授试图教你的范围，但要证明：

use strict;
use warnings;

use PPI;

my $src = do {local $/; <DATA>};

# Load a document
my $doc = PPI::Document->new( \$src );

# Find all the barewords within the doc
my $nums = $doc->find( 'PPI::Token::Number' );
for (@$nums) {
    print $_->content, "\n";
}

__DATA__
# Comment 1
my $time = <STDIN>;
chomp $time;
   #now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
   print "true or false";
   return 3 + 4 eq "7"; # true or false
}

输出：

Perl正则表达式匹配数字

2 个答案: