Question

我正在尝试编写一个perl脚本，该脚本可以获取所有不以单引号开头和结尾的字符串。并且字符串不能成为注释＃的一部分，并且在行的开头不需要 DATA 中的每一行。

use warnings;
use strict;


my $file; 
{ 
local $/ = undef; 
$file = <DATA>; 
};
my @strings = $file =~ /(?:[^']).*(?:[^'])/g;
print join ("\n",@strings);

__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";

我对这个正则表达式一无所知。预期的输出是

"This is a string2"
"This is comment syntax #"
"This is string 4"

Answer 1

显然这只是一项练习，因为最近有很多学生在询问这个问题。正则表达式只会让你成为那里的一部分，因为几乎总会有边缘情况。

以下代码可能已经足够用于您的目的，但由于qr{}中的引号，它甚至无法成功解析自身。你必须弄清楚如何获得跨越线条的字符串来自己工作：

use strict;
use warnings;

my $doublequote_re = qr{"(?: (?> [^\\"]+ ) | \\. )*"}x;
my $singlequote_re = qr{'(?: (?> [^\\']+ ) | \\. )*'}x;

my $data = do { local $/; <DATA> };

while ($data =~ m{(#.*|$singlequote_re|$doublequote_re)}g) {
    my $match = $1;

    if ($match =~ /^#/) {
        print "Comment - $match\n";

    } elsif ($match =~ /^"/) {
        print "Double quote - $match\n";

    } elsif ($match =~ /^'/) {
        print "Single quote - $match\n";

    } else {
        die "Carp!  something went wrong!  <$match>";
    }
}

__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";

Answer 2

不知道如何通过使用正则表达式实现这一点，所以这里有一个简单的手写lexer：

#!/usr/bin/perl

use strict;
use warnings;

sub extract_string {
    my @buf = split //, shift;

    while (my $peer = shift @buf) {
        if ($peer eq '"') {
            my $str = "$peer";
            while ($peer = shift @buf) {
                $str .= "$peer";
                last if $peer eq '"';
            }
            if ($peer) {
                return ($str, join '', @buf);
            }
            else {
                return ("", "");
            }
        }
        elsif ($peer eq '#') {
            return ("", "");
        }
    }
}

my ($str, $buf);

while ($buf = <DATA>) {
    chomp $buf;
    while (1) {
        ($str, $buf) = extract_string $buf;
        print "$str\n" if $str;
        last unless $buf;
    }
}

__DATA__
my $string = 'This is string1';
"This is string2"
# comment : "This is string3"
print "This is comment syntax #"."This is string4";

另一个选择是使用Perl模块，例如PPI。

perl正则表达式没有开始和结束wi

2 个答案: