我希望匹配具有以下格式的日期:
2010-08-27 02:11:36
即。 yyyy-mm-dd hh:mm:ss
。
现在我并不是特别关注日期实际可行,只是它的格式正确。
可能匹配的格式(对于此示例)
2010
2010-08
2010-08-27
2010-08-27 02
2010-08-27 02:11
2010-08-27 02:11:36
在Perl中,什么可以是一个简洁的正则表达式?
到目前为止,我有这个(有效,顺便说一句)
/\d{4}(-\d{2}(-\d{2}( \d{2}(:\d{2}(:\d{2})?)?)?)?)?/
这可以改善性能吗?
答案 0 :(得分:8)
由于全年缺乏捕获组,我假设您只关心日期是否匹配。
我尝试了几个与你问题相关的不同模式,而那个提高了10%到15%的模式是禁用捕获,即,
/\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?/
perlre documentation涵盖(?:...)
:
(?:图案)
(imsx-imsx:图案)
这是用于聚类,而不是捕获;它对
()
之类的子表达式进行分组,但不会像()
那样进行反向引用。所以@fields = split(/\b(?:a|b|c)\b/)
就像
@fields = split(/\b(a|b|c)\b/)
但不会吐出额外的字段。如果你不需要捕捉角色也更便宜。
?
和:
之间的任何字母都与(?imsx-imsx)
一样充当标志修饰符。例如,/(?s-i:more.*than).*million/i
相当于更详细的
/(?:(?s-i)more.*than).*million/i
基准输出:
Rate U U/NC CH/NC/A CH/NC/A/U CH CH/NC null U 31811/s -- -32% -58% -59% -61% -66% -93% U/NC 46849/s 47% -- -38% -39% -42% -50% -90% CH/NC/A 76119/s 139% 62% -- -1% -6% -18% -84% CH/NC/A/U 76663/s 141% 64% 1% -- -6% -17% -84% CH 81147/s 155% 73% 7% 6% -- -13% -83% CH/NC 92789/s 192% 98% 22% 21% 14% -- -81% null 481882/s 1415% 929% 533% 529% 494% 419% --
代码:
#! /usr/bin/perl
use warnings;
use strict;
use Benchmark qw/ :all /;
sub option_chain {
local($_) = @_;
/\d{4}(-\d{2}(-\d{2}( \d{2}(:\d{2}(:\d{2})?)?)?)?)?/
}
sub option_chain_nocap {
local($_) = @_;
/\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?/
}
sub option_chain_nocap_anchored {
local($_) = @_;
/\A\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?\z/
}
sub option_chain_anchored_unrolled {
local($_) = @_;
/\A\d\d\d\d(-\d\d(-\d\d( \d\d(:\d\d(:\d\d)?)?)?)?)?\z/
}
sub simple_split {
local($_) = @_;
split /[ :-]/;
}
sub unrolled {
local($_) = @_;
grep defined($_), /\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d):(\d\d):(\d\d) \z
|\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d):(\d\d) \z
|\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d) \z
|\A (\d\d\d\d)-(\d\d)-(\d\d) \z
|\A (\d\d\d\d)-(\d\d) \z
|\A (\d\d\d\d) \z
/x;
}
sub unrolled_nocap {
local($_) = @_;
grep defined($_), /\A \d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d \z
|\A \d\d\d\d-\d\d-\d\d \d\d:\d\d \z
|\A \d\d\d\d-\d\d-\d\d \d\d \z
|\A \d\d\d\d-\d\d-\d\d \z
|\A \d\d\d\d-\d\d \z
|\A \d\d\d\d \z
/x;
}
sub id { $_[0] }
my @examples = (
"xyz",
"2010",
"2010-08",
"2010-08-27",
"2010-08-27 02",
"2010-08-27 02:11",
"2010-08-27 02:11:36",
);
cmpthese -1 => {
"CH" => sub { option_chain $_ for @examples },
"CH/NC" => sub { option_chain_nocap $_ for @examples },
"CH/NC/A" => sub { option_chain_nocap_anchored $_ for @examples },
"CH/NC/A/U" => sub { option_chain_anchored_unrolled $_ for @examples },
"U" => sub { unrolled $_ for @examples },
"U/NC" => sub { unrolled_nocap $_ for @examples },
"null" => sub { id $_ for @examples },
};
答案 1 :(得分:5)
来自Regexp::Common::time的内容怎么样?
答案 2 :(得分:3)
你的正则表达式很好,除了缺少锚点(除非你想在“abc200890”中匹配2008?)。假设你想匹配整个字符串:
/^\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?\z/
如果您实际上不想要捕获的子字符串,则应使用 (?:...)
,我猜这是个案例。
答案 3 :(得分:2)
我会使用split函数:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @dates = (
'2010',
'2010-08',
'2010-08-27',
'2010-08-27 02',
'2010-08-27 02:11',
'2010-08-27 02:11:36',
);
for (@dates) {
my @list = split /[ :-]/;
print Dumper(\@list);
}
输出:
$VAR1 = [
'2010'
];
$VAR1 = [
'2010',
'08'
];
$VAR1 = [
'2010',
'08',
'27'
];
$VAR1 = [
'2010',
'08',
'27',
'02'
];
$VAR1 = [
'2010',
'08',
'27',
'02',
'11'
];
$VAR1 = [
'2010',
'08',
'27',
'02',
'11',
'36'
];
答案 4 :(得分:1)
这符合上述所有内容(但也包括其他内容 - 请参阅评论!),可能会更容易理解:
/(\d{4})(-\d{2})?(\w{1}\d{2})?(:\d{2})?/
答案 5 :(得分:1)
如果你想要更快,那么远离正则表达式,看看XS模块:Date::Calc是一个很好的。