Perl或Python:将日期从dd / mm / yyyy转换为yyyy-mm-dd

时间:2010-11-02 13:06:57

标签: python perl date text-processing

我在CSV文件的列中有很多日期,我需要将其从dd / mm / yyyy转换为yyyy-mm-dd格式。例如,17/01/2010应转换为2010-01-17。

我如何在Perl或Python中执行此操作?

8 个答案:

答案 0 :(得分:30)

如果您保证格式良好的数据只包含DD-MM-YYYY格式的单例日期,那么这就有效:

# FIRST METHOD
my $ndate = join("-" => reverse split(m[/], $date));

适用于持有“07/04/1776”的$date,但未能执行“此17/01/2010以及那里的01/17/2010”。相反,使用:

# SECOND METHOD
($ndate = $date) =~ s{
    \b
      ( \d \d   )
    / ( \d \d   )
    / ( \d {4}  )
    \b
}{$3-$2-$1}gx;

如果你更喜欢更“语法”的正则表达式,那么它更容易维护和更新,你可以改用它:

# THIRD METHOD
($ndate = $date) =~ s{
    (?&break)

              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )

    (?&break)

    (?(DEFINE)
        (?<break> \b     )
        (?<slash> /      )
        (?<year>  \d {4} )
        (?<month> \d {2} )
        (?<day>   \d {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

最后,如果您有Unicode数据,您可能需要更加小心。

# FOURTH METHOD
($ndate = $date) =~ s{
    (?&break_before)
              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )
    (?&break_after)

    (?(DEFINE)
        (?<slash>     /                  )
        (?<start>     \A                 )
        (?<finish>    \z                 )

        # don't really want to use \D or [^0-9] here:
        (?<break_before>
           (?<= [\pC\pP\pS\p{Space}] )
         | (?<= \A                )
        )
        (?<break_after>
            (?= [\pC\pP\pS\p{Space}]
              | \z
            )
        )
        (?<digit> \d            )
        (?<year>  (?&digit) {4} )
        (?<month> (?&digit) {2} )
        (?<day>   (?&digit) {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

您可以看到这四种方法在面对如下样本输入字符串时的表现如何:

my $sample  = q(17/01/2010);
my @strings =  (
    $sample,  # trivial case

    # multiple case
    "this $sample and that $sample there",

    # multiple case with non-ASCII BMP code points
    # U+201C and U+201D are LEFT and RIGHT DOUBLE QUOTATION MARK
    "from \x{201c}$sample\x{201d} through\xA0$sample",

    # multiple case with non-ASCII code points
    #   from both the BMP and the SMP 
    # code point U+02013 is EN DASH, props \pP \p{Pd}
    # code point U+10179 is GREEK YEAR SIGN, props \pS \p{So}
    # code point U+110BD is KAITHI NUMBER SIGN, props \pC \p{Cf}
    "\x{10179}$sample\x{2013}\x{110BD}$sample",
);

现在让$date成为该数组的foreach迭代器,我们得到这个输出:

Original is:   17/01/2010
First method:  2010-01-17
Second method: 2010-01-17
Third method:  2010-01-17
Fourth method: 2010-01-17

Original is:   this 17/01/2010 and that 17/01/2010 there
First method:  2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method:  this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there

Original is:   from “17/01/2010” through 17/01/2010
First method:  2010-01-2010” through 17-01-from “17
Second method: from “2010-01-17” through 2010-01-17
Third method:  from “2010-01-17” through 2010-01-17
Fourth method: from “2010-01-17” through 2010-01-17

Original is:   17/01/2010–17/01/2010
First method:  2010-01-2010–17-01-17
Second method: 2010-01-17–2010-01-17
Third method:  2010-01-17–2010-01-17
Fourth method: 2010-01-17–2010-01-17

现在让我们假设你实际上想要匹配非ASCII数字。例如:

   U+660  ARABIC-INDIC DIGIT ZERO
   U+661  ARABIC-INDIC DIGIT ONE
   U+662  ARABIC-INDIC DIGIT TWO
   U+663  ARABIC-INDIC DIGIT THREE
   U+664  ARABIC-INDIC DIGIT FOUR
   U+665  ARABIC-INDIC DIGIT FIVE
   U+666  ARABIC-INDIC DIGIT SIX
   U+667  ARABIC-INDIC DIGIT SEVEN
   U+668  ARABIC-INDIC DIGIT EIGHT
   U+669  ARABIC-INDIC DIGIT NINE

甚至

 U+1D7F6  MATHEMATICAL MONOSPACE DIGIT ZERO
 U+1D7F7  MATHEMATICAL MONOSPACE DIGIT ONE
 U+1D7F8  MATHEMATICAL MONOSPACE DIGIT TWO
 U+1D7F9  MATHEMATICAL MONOSPACE DIGIT THREE
 U+1D7FA  MATHEMATICAL MONOSPACE DIGIT FOUR
 U+1D7FB  MATHEMATICAL MONOSPACE DIGIT FIVE
 U+1D7FC  MATHEMATICAL MONOSPACE DIGIT SIX
 U+1D7FD  MATHEMATICAL MONOSPACE DIGIT SEVEN
 U+1D7FE  MATHEMATICAL MONOSPACE DIGIT EIGHT
 U+1D7FF  MATHEMATICAL MONOSPACE DIGIT NINE

所以想象你有一个数学等宽数字的日期,如下:

$date = "\x{1D7F7}\x{1D7FD}/\x{1D7F7}\x{1D7F6}/\x{1D7F8}\x{1D7F6}\x{1D7F7}\x{1D7F6}";

Perl代码可以正常工作:

Original is:   //
First method:  --
Second method: --
Third method:  --
Fourth method: --

我认为你会发现Python有一个相当大脑损坏的Unicode模型,它缺乏对抽象字符和字符串的支持,无论内容如何都会让写这样的东西变得非常困难。

在Python中编写清晰的正则表达式也很困难,因为在那里你将子表达式的声明与它们的执行分离,因为那里不支持(?(DEFINE)...)块。哎呀,Python甚至不支持Unicode属性。由于这个原因,它不适合Unicode正则表达式工作。

但是,嘿,如果你认为与Perl(并且肯定是)相比,Python中的那个很糟糕,那就试试其他任何语言吧。我还没有找到一个对这类工作来说还不差的人。

如您所见,当您要求使用多种语言的正则表达式解决方案时,您会遇到实际问题。首先,由于不同的正则表达风味,难以比较解决方案。但也因为没有其他语言可以与Perl在正则表达式中的功能,表现力和可维护性进行比较。一旦任意Unicode进入图片,这可能会变得更加明显。

因此,如果您只是想要Python,那么您应该只是要求它。否则,这将是一场非常不公平的比赛,Python几乎总会失败;在Python中使这样的事情变得非常糟糕,更不用说两者正确并清理。这比它能产生的要多得多。

相比之下,Perl的正则表达方式在这两方面都表现出色。

答案 1 :(得分:17)

>>> from datetime import datetime
>>> datetime.strptime('02/11/2010', '%d/%m/%Y').strftime('%Y-%m-%d')
'2010-11-02'

或更多hackish方式(不检查值的有效性):

>>> '-'.join('02/11/2010'.split('/')[::-1])
'2010-11-02'
>>> '-'.join(reversed('02/11/2010'.split('/')))
'2010-11-02'

答案 2 :(得分:11)

使用Time :: Piece(自5.9.5以来的核心),与接受的Python解决方案非常相似,因为它提供了strptime和strftime函数:

use Time::Piece;
my $dt_str = Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');

$ perl -MTime::Piece
print Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');
1979-10-13
$ 

答案 3 :(得分:6)

使用Perl:datetime Python包刚刚破解。您可以使用正则表达式来交换周围的日期部分,例如

echo "17/01/2010" | perl -pe 's{(\d+)/(\d+)/(\d+)}{$3-$2-$1}g'

如果您确实需要解析这些日期(例如,计算他们的星期几或其他日历类型的操作),请查看DateTimeX::Easy(您可以在Ubuntu下使用apt-get进行安装):< / p>

perl -MDateTimeX::Easy -e 'print DateTimeX::Easy->parse("17/01/2010")->ymd("-")'

答案 4 :(得分:5)

Perl:

while (<>) {
  s/(^|[^\d])(\d\d)\/(\d\d)\/(\d{4})($|[^\d])/$4-$3-$2/g;
  print $_;
}

然后你必须运行:

perl MyScript.pl < oldfile.txt > newfile.txt

答案 5 :(得分:1)

Perl:

my $date =~ s/(\d+)\/(\d+)\/(\d+)/$3-$2-$1/;

答案 6 :(得分:0)

在Perl中你可以这样做:

use strict;
while(<>) {
    chomp;
    my($d,$m,$y) = split/\//;
    my $newDate = $y.'-'.$m.'-'.$d;
}

答案 7 :(得分:-2)

以光荣的perl-oneliner形式:

echo 17/01/2010 | perl -p -e "chomp;  join('-', reverse split /\//);"

但严肃地说,我会这样做:

#!/usr/bin/env perl
while (<>) {
    chomp;
    print join('-', reverse split /\//), "\n";
}

哪个适用于管道,每行转换和打印一个日期。