我正在尝试在perl中进行模式匹配,我在文件中读取的行的开头检查“非空格字符”,并返回第一个匹配的单词。
问题是,有时我会以“:”结尾的单词,有时我不会。
例如:
假设我有一个包含以下内容的文件。有时与替代内容。该文件将自动填充。
some0 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
some3 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
替代内容:
some1: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
some3: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
现在我只想从这个文件中提取第一个单词。但是如果文件具有备用内容,我仍然只想要忽略尾随':'的第一个单词。
我这里只需要模式匹配部分。 这就是我到目前为止所做的。
foreach ...
if (/^(\S+):/) {
print $1;
}
/ *如果我使用上面的模式匹配我从备用内容中获取第一个单词,即some1和some3忽略尾随“:”但是当我有原始内容时$ 1不匹配。 * /
但如果我使用
foreach ...
if (/^(\S+)/) {
print $1;
}
/ *现在替代内容将不匹配。 * /
这里有任何提示吗?
答案 0 :(得分:2)
不包括空格和冒号的贪婪匹配:
while (<DATA>) {
if (/^([^:\s]+)/) {
print "$1\n";
}
}
__DATA__
some0 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
some3 Loren Posem:is some color::and some foo bar with 1023:4632
some more content added to the file
Alternate content:
some1: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
some3: Loren Posem:is some will be different with some number 5423:32
some more content added to the file
答案 1 :(得分:1)
如果要处理大量数据,split
ting(并设置split
的LIMIT)来获取第一个单词可以在捕获正则表达式方面提供显着的性能优势,在这种情况下:
foreach ...
if ( my $firstWord = ( split /[:\s]/, $_, 2 )[0] ) {
print $firstWord, "\n";
}
use strict;
use warnings;
use Benchmark qw/cmpthese/;
my @data = <DATA>;
sub _split {
for (@data) {
if ( my $firstWord = ( split /[:\s]/, $_, 2 )[0] ) {
#print $firstWord, "\n";
}
}
}
sub _regex {
for (@data) {
if ( my ($firstWord) = /^([^:\s]+)/ ) {
#print $firstWord, "\n";
}
}
}
cmpthese(
-5,
{
_split => sub { _split() },
_regex => sub { _regex() }
}
);
__DATA__
some0 Loren Posem:is some color::and some foo bar with 1023:4632
some3 Loren Posem:is some color::and some foo bar with 1023:4632
some1: Loren Posem:is some will be different with some number 5423:3
some3: Loren Posem:is some will be different with some number 5423:32
输出(表中较快的时间较短):
Rate _regex _split
_regex 396843/s -- -12%
_split 450546/s 14% --
但是,您可能会发现正则表达式更具可读性。
希望这有帮助!