Perl Date :: Parse - 如何正确解析1901年和1969年之间的日期

时间:2018-02-19 20:26:39

标签: perl date datetime time

背景

我正在使用Perl来解析用户输入的日期和日期时间,这些用户对格式不太谨慎。 Perl模块Date::Parse似乎很棒,因为它处理了我需要处理的大多数情况。

正如我今天发现的那样,1901-01-01 00:00:001968-12-31 23:59:59之间的日期时间除外。对于那些日期时间,Date :: Parse str2time在将日期时间解析为纪元时间时会额外增加100年。

代码

以下是我用来解析日期时间的代码:

#!/usr/bin/perl
#---------------------------------------------------------------------
# format_date.pl
#
# format variable date inputs
#---------------------------------------------------------------------

use strict;
use warnings;

use Date::Parse;
use DateTime;

my $DEFAULT_TIME_ZONE = "GMT";

my @dates = (
    "1899-06-24 09:44:00",
    "1900-12-31 23:59:59",
    "1901-01-01 00:00:00",
    "1960-12-31 23:59:59",
    "1966-06-24 09:44:00",
    "1968-12-31 23:59:59",
    "1969-01-01 00:00:00",
    "1969-12-31 23:59:59",
    "1970-01-01 00:00:01",
    "2000-01-01 00:00:00",
    "2017-06-24 23:59:59",
    "2018-06-24 09:44:00",
    "2238-06-24 09:44:00"

);

foreach my $string (@dates) {

    # format datetime field from any valid datetime input
    # default time zone is used if timezone is not included in string
    my $epoch = str2time( $string, $DEFAULT_TIME_ZONE );

    # error if date is not correctly parsed
    if ( !$epoch ) {
        die("ERROR ====> invalid datetime ($string), "
        . "datetime format should be YYYY-MM-DD HH:MM:SS");
    }

    my $date = DateTime->from_epoch( epoch => $epoch );

    printf( "formatting datetime: value = %20s, epoch = %20u, "
            . "date = %20s\n", $string, $epoch, $date );

}

exit 0;

附注:我需要改进错误处理,因为有效日期1970-01-01 00:00:00会引发错误。

输出

1901年至1969年之间的额外100年可以在输出中看到:

formatting datetime: value =  1899-06-24 09:44:00, epoch = 18446744071484095456, date =  1899-06-24T09:44:00
formatting datetime: value =  1900-12-31 23:59:59, epoch = 18446744071532098815, date =  1900-12-31T23:59:59
formatting datetime: value =  1901-01-01 00:00:00, epoch =            978307200, date =  2001-01-01T00:00:00
formatting datetime: value =  1960-12-31 23:59:59, epoch =           2871763199, date =  2060-12-31T23:59:59
formatting datetime: value =  1966-06-24 09:44:00, epoch =           3044598240, date =  2066-06-24T09:44:00
formatting datetime: value =  1968-12-31 23:59:59, epoch =           3124223999, date =  2068-12-31T23:59:59
formatting datetime: value =  1969-01-01 00:00:00, epoch = 18446744073678015616, date =  1969-01-01T00:00:00
formatting datetime: value =  1969-12-31 23:59:59, epoch = 18446744073709551615, date =  1969-12-31T23:59:59
formatting datetime: value =  1970-01-01 00:00:01, epoch =                    1, date =  1970-01-01T00:00:01
formatting datetime: value =  2000-01-01 00:00:00, epoch =            946684800, date =  2000-01-01T00:00:00
formatting datetime: value =  2017-06-24 23:59:59, epoch =           1498348799, date =  2017-06-24T23:59:59
formatting datetime: value =  2018-06-24 09:44:00, epoch =           1529833440, date =  2018-06-24T09:44:00
formatting datetime: value =  2238-06-24 09:44:00, epoch =           8472332640, date =  2238-06-24T09:44:00

附加说明

Date::Parse文档表明它可以处理至少1901-01-01的旧日期。 Time::Local文档表明它应该能够处理更早的日期。

问题

我该如何处理这种奇怪的事情?是否有更好的方法来解析使用Perl的变量输入格式?

编辑:多种日期格式的示例

输入可以是多种格式。以下是一系列示例:

my @dates = (
    "2018-02-20 00:00:00",
    "20180220",
    "02/20/2018",
    "02/20/18",    # interpreted as 1918-02-20
    "2018-02-20"
);

3 个答案:

答案 0 :(得分:2)

基础问题由切线回答。

  

问题在于Date :: Parse - 请参阅this issue。完整答案perlmonks - tangent

解决方案1 ​​

我的解决方案是使用Date :: Parse strptime而不是str2time。

Date :: Parse strptime将日期解析为数组($ ss,$ mm,$ hh,$ day,$ month,$ year,$ zone)。这允许使用以下年份将年份转换回4位数年份:

if ( $year < 1000 ) { $year += 1900; }

然后将日期传递到DateTime-&gt; new()。

解决方案2(更好)

基于对perlmonks上的thanos的讨论,我探索了使用Date :: Manip模块来解析日期时间。这个简化的解析变量只输入一行。它甚至可以正确处理2位数年份。以下是代码片段:

say UnixDate( ParseDate($_), '%Y-%m-%d %T' ) for (@dates);

请参阅perlmonks上的示例脚本和输出。

答案 1 :(得分:1)

只需使用模块Date::Manip添加其他可能的解决方案。

use Date::Manip;
use use feature 'say';

foreach my $datestr (@dates) {
    my $epochSecs = UnixDate($datestr,'%s');
    my $date = UnixDate( ParseDateString("epoch $epochSecs"), "%Y-%m-%d %T");
    say "Date value =  ".$datestr.", epoch = ".$epochSecs.", date = " .$date;
}

希望这有帮助,BR。

答案 2 :(得分:0)

纪元时间是自1970-01-01T00:00:00Z以来的秒数。您尝试转换为纪元时间的日期早于此。

为什么使用两个不同的日期时间库?如果需要DateTime对象,请使用DateTime模块。

use DateTime::Format::DateParse qw( );

for my $dt_str (@dates) {
    my $dt = DateTime::Format::DateParse->parse_datetime($dt_str, $DEFAULT_TIME_ZONE)
       or die(...);

    ...
}

产地:

1899-06-24 09:44:00 => 3799-06-24T09:44:00  <- doh!
1900-12-31 23:59:59 => 3800-12-31T23:59:59  <- doh!
1901-01-01 00:00:00 => 1901-01-01T00:00:00
1960-12-31 23:59:59 => 1960-12-31T23:59:59
1966-06-24 09:44:00 => 1966-06-24T09:44:00
1968-12-31 23:59:59 => 1968-12-31T23:59:59
1969-01-01 00:00:00 => 1969-01-01T00:00:00
1969-12-31 23:59:59 => 1969-12-31T23:59:59
1970-01-01 00:00:01 => 1970-01-01T00:00:01
2000-01-01 00:00:00 => 2000-01-01T00:00:00
2017-06-24 23:59:59 => 2017-06-24T23:59:59
2018-06-24 09:44:00 => 2018-06-24T09:44:00
2238-06-24 09:44:00 => 2238-06-24T09:44:00
2018-02-20 00:00:00 => 2018-02-20T00:00:00
20180220            => 2018-02-20T00:00:00
02/20/2018          => 2018-02-20T00:00:00
02/20/18            => 1918-02-20T00:00:00
2018-02-20          => 2018-02-20T00:00:00

让我们完全避免使用DateParse。

use DateTime::Format::Strptime qw( );
use List::MoreUtils            qw( first_result );

my @patterns = (
   '%Y-%m-%d %H:%M:%S',
   '%Y-%m-%d',
   '%Y%m%d',
   '%m/%d/%Y',
   '%m/%d/%y',
);

my @formats =
   map {
      DateTime::Format::Strptime->new(
         pattern   => $_,
         time_zone => $DEFAULT_TIME_ZONE,
         on_error  => 'undef',
      )
   }
      @patterns;

for my $dt_str (@dates) {
    my $dt = first_result { $_->parse_datetime($dt_str) } @formats
       or die(...);

    ...
}

产地:

1899-06-24 09:44:00 => 1899-06-24T09:44:00
1900-12-31 23:59:59 => 1900-12-31T23:59:59
1901-01-01 00:00:00 => 1901-01-01T00:00:00
1960-12-31 23:59:59 => 1960-12-31T23:59:59
1966-06-24 09:44:00 => 1966-06-24T09:44:00
1968-12-31 23:59:59 => 1968-12-31T23:59:59
1969-01-01 00:00:00 => 1969-01-01T00:00:00
1969-12-31 23:59:59 => 1969-12-31T23:59:59
1970-01-01 00:00:01 => 1970-01-01T00:00:01
2000-01-01 00:00:00 => 2000-01-01T00:00:00
2017-06-24 23:59:59 => 2017-06-24T23:59:59
2018-06-24 09:44:00 => 2018-06-24T09:44:00
2238-06-24 09:44:00 => 2238-06-24T09:44:00
2018-02-20 00:00:00 => 2018-02-20T00:00:00
20180220            => 2018-02-20T00:00:00
02/20/2018          => 2018-02-20T00:00:00
02/20/18            => 2018-02-20T00:00:00
2018-02-20          => 2018-02-20T00:00:00