我正在使用Perl来解析用户输入的日期和日期时间,这些用户对格式不太谨慎。 Perl模块Date::Parse
似乎很棒,因为它处理了我需要处理的大多数情况。
正如我今天发现的那样,1901-01-01 00:00:00
和1968-12-31 23:59:59
之间的日期时间除外。对于那些日期时间,Date :: Parse str2time在将日期时间解析为纪元时间时会额外增加100年。
以下是我用来解析日期时间的代码:
#!/usr/bin/perl
#---------------------------------------------------------------------
# format_date.pl
#
# format variable date inputs
#---------------------------------------------------------------------
use strict;
use warnings;
use Date::Parse;
use DateTime;
my $DEFAULT_TIME_ZONE = "GMT";
my @dates = (
"1899-06-24 09:44:00",
"1900-12-31 23:59:59",
"1901-01-01 00:00:00",
"1960-12-31 23:59:59",
"1966-06-24 09:44:00",
"1968-12-31 23:59:59",
"1969-01-01 00:00:00",
"1969-12-31 23:59:59",
"1970-01-01 00:00:01",
"2000-01-01 00:00:00",
"2017-06-24 23:59:59",
"2018-06-24 09:44:00",
"2238-06-24 09:44:00"
);
foreach my $string (@dates) {
# format datetime field from any valid datetime input
# default time zone is used if timezone is not included in string
my $epoch = str2time( $string, $DEFAULT_TIME_ZONE );
# error if date is not correctly parsed
if ( !$epoch ) {
die("ERROR ====> invalid datetime ($string), "
. "datetime format should be YYYY-MM-DD HH:MM:SS");
}
my $date = DateTime->from_epoch( epoch => $epoch );
printf( "formatting datetime: value = %20s, epoch = %20u, "
. "date = %20s\n", $string, $epoch, $date );
}
exit 0;
附注:我需要改进错误处理,因为有效日期1970-01-01 00:00:00
会引发错误。
1901年至1969年之间的额外100年可以在输出中看到:
formatting datetime: value = 1899-06-24 09:44:00, epoch = 18446744071484095456, date = 1899-06-24T09:44:00
formatting datetime: value = 1900-12-31 23:59:59, epoch = 18446744071532098815, date = 1900-12-31T23:59:59
formatting datetime: value = 1901-01-01 00:00:00, epoch = 978307200, date = 2001-01-01T00:00:00
formatting datetime: value = 1960-12-31 23:59:59, epoch = 2871763199, date = 2060-12-31T23:59:59
formatting datetime: value = 1966-06-24 09:44:00, epoch = 3044598240, date = 2066-06-24T09:44:00
formatting datetime: value = 1968-12-31 23:59:59, epoch = 3124223999, date = 2068-12-31T23:59:59
formatting datetime: value = 1969-01-01 00:00:00, epoch = 18446744073678015616, date = 1969-01-01T00:00:00
formatting datetime: value = 1969-12-31 23:59:59, epoch = 18446744073709551615, date = 1969-12-31T23:59:59
formatting datetime: value = 1970-01-01 00:00:01, epoch = 1, date = 1970-01-01T00:00:01
formatting datetime: value = 2000-01-01 00:00:00, epoch = 946684800, date = 2000-01-01T00:00:00
formatting datetime: value = 2017-06-24 23:59:59, epoch = 1498348799, date = 2017-06-24T23:59:59
formatting datetime: value = 2018-06-24 09:44:00, epoch = 1529833440, date = 2018-06-24T09:44:00
formatting datetime: value = 2238-06-24 09:44:00, epoch = 8472332640, date = 2238-06-24T09:44:00
Date::Parse
文档表明它可以处理至少1901-01-01的旧日期。 Time::Local
文档表明它应该能够处理更早的日期。
我该如何处理这种奇怪的事情?是否有更好的方法来解析使用Perl的变量输入格式?
输入可以是多种格式。以下是一系列示例:
my @dates = (
"2018-02-20 00:00:00",
"20180220",
"02/20/2018",
"02/20/18", # interpreted as 1918-02-20
"2018-02-20"
);
答案 0 :(得分:2)
基础问题由切线回答。
问题在于Date :: Parse - 请参阅this issue。完整答案perlmonks - tangent
我的解决方案是使用Date :: Parse strptime而不是str2time。
Date :: Parse strptime将日期解析为数组($ ss,$ mm,$ hh,$ day,$ month,$ year,$ zone)。这允许使用以下年份将年份转换回4位数年份:
if ( $year < 1000 ) { $year += 1900; }
然后将日期传递到DateTime-&gt; new()。
基于对perlmonks上的thanos的讨论,我探索了使用Date :: Manip模块来解析日期时间。这个简化的解析变量只输入一行。它甚至可以正确处理2位数年份。以下是代码片段:
say UnixDate( ParseDate($_), '%Y-%m-%d %T' ) for (@dates);
请参阅perlmonks上的示例脚本和输出。
答案 1 :(得分:1)
只需使用模块Date::Manip添加其他可能的解决方案。
use Date::Manip;
use use feature 'say';
foreach my $datestr (@dates) {
my $epochSecs = UnixDate($datestr,'%s');
my $date = UnixDate( ParseDateString("epoch $epochSecs"), "%Y-%m-%d %T");
say "Date value = ".$datestr.", epoch = ".$epochSecs.", date = " .$date;
}
希望这有帮助,BR。
答案 2 :(得分:0)
纪元时间是自1970-01-01T00:00:00Z以来的秒数。您尝试转换为纪元时间的日期早于此。
为什么使用两个不同的日期时间库?如果需要DateTime对象,请使用DateTime模块。
use DateTime::Format::DateParse qw( );
for my $dt_str (@dates) {
my $dt = DateTime::Format::DateParse->parse_datetime($dt_str, $DEFAULT_TIME_ZONE)
or die(...);
...
}
产地:
1899-06-24 09:44:00 => 3799-06-24T09:44:00 <- doh!
1900-12-31 23:59:59 => 3800-12-31T23:59:59 <- doh!
1901-01-01 00:00:00 => 1901-01-01T00:00:00
1960-12-31 23:59:59 => 1960-12-31T23:59:59
1966-06-24 09:44:00 => 1966-06-24T09:44:00
1968-12-31 23:59:59 => 1968-12-31T23:59:59
1969-01-01 00:00:00 => 1969-01-01T00:00:00
1969-12-31 23:59:59 => 1969-12-31T23:59:59
1970-01-01 00:00:01 => 1970-01-01T00:00:01
2000-01-01 00:00:00 => 2000-01-01T00:00:00
2017-06-24 23:59:59 => 2017-06-24T23:59:59
2018-06-24 09:44:00 => 2018-06-24T09:44:00
2238-06-24 09:44:00 => 2238-06-24T09:44:00
2018-02-20 00:00:00 => 2018-02-20T00:00:00
20180220 => 2018-02-20T00:00:00
02/20/2018 => 2018-02-20T00:00:00
02/20/18 => 1918-02-20T00:00:00
2018-02-20 => 2018-02-20T00:00:00
让我们完全避免使用DateParse。
use DateTime::Format::Strptime qw( );
use List::MoreUtils qw( first_result );
my @patterns = (
'%Y-%m-%d %H:%M:%S',
'%Y-%m-%d',
'%Y%m%d',
'%m/%d/%Y',
'%m/%d/%y',
);
my @formats =
map {
DateTime::Format::Strptime->new(
pattern => $_,
time_zone => $DEFAULT_TIME_ZONE,
on_error => 'undef',
)
}
@patterns;
for my $dt_str (@dates) {
my $dt = first_result { $_->parse_datetime($dt_str) } @formats
or die(...);
...
}
产地:
1899-06-24 09:44:00 => 1899-06-24T09:44:00
1900-12-31 23:59:59 => 1900-12-31T23:59:59
1901-01-01 00:00:00 => 1901-01-01T00:00:00
1960-12-31 23:59:59 => 1960-12-31T23:59:59
1966-06-24 09:44:00 => 1966-06-24T09:44:00
1968-12-31 23:59:59 => 1968-12-31T23:59:59
1969-01-01 00:00:00 => 1969-01-01T00:00:00
1969-12-31 23:59:59 => 1969-12-31T23:59:59
1970-01-01 00:00:01 => 1970-01-01T00:00:01
2000-01-01 00:00:00 => 2000-01-01T00:00:00
2017-06-24 23:59:59 => 2017-06-24T23:59:59
2018-06-24 09:44:00 => 2018-06-24T09:44:00
2238-06-24 09:44:00 => 2238-06-24T09:44:00
2018-02-20 00:00:00 => 2018-02-20T00:00:00
20180220 => 2018-02-20T00:00:00
02/20/2018 => 2018-02-20T00:00:00
02/20/18 => 2018-02-20T00:00:00
2018-02-20 => 2018-02-20T00:00:00