Question

假设我有一个带有名称字段和3个日期字段的文件，并希望重新格式化日期。我可以这样：

while (<DATA>) {
  my @lines = split(/\|/); ##splitting DATA by '|'

  my @dates = split( /\/|[-]/, $lines[0] ); #splitting only the first element of array and performing modifications below.
  if ( $dates[2] =~ /^[0-1][0-9]$/gi ) { $dates[2] = $dates[2] + 2000 }
  elsif ( $dates[2] =~ /^[2-9][0-9]$/gi ) {
    $dates[2] = $dates[2] + 1900;
  }
  if ( $dates[1] =~ /^\d$/gi ) { $dates[1] = "0" . $dates[1] }
  if ( $dates[0] =~ /^\d$/gi ) { $dates[0] = "0" . $dates[0] }
  my $date = join "-", @dates[ 2, 0, 1 ]; #joining the dates to be in yyyy-mm-dd format.
  print $date, "\n"; #double check
  print $date, ",", ( join ",", @lines[ 1 .. $#lines ] ), "\n"; appending date to print the join of @lines.
}

有没有一种方法可以同时对所有deisired字段执行修改，而不必分割和连接每个$ line [0]到$ lines [2]？（$ lines [0]到$ lines [2]）。

__DATA__
12/23/2014|2/20/1995|3/25/1905|josh

Answer 1

您的脚本会根据您提供的输入提供令人讨厌的输出。以下输出对我来说似乎更符合逻辑：

#!/usr/bin/env perl

use strict;
use warnings;

while (my $line = <DATA>) {
    next unless $line =~ /\S/;
    my ($name, @dates) = reverse split qr{\|}, $line;
    @dates = reverse map sprintf('%04d-%02d-%02d', (split qr{/})[2,0,1]), @dates;
    print join(',', @dates, $name), "\n";
}
__DATA__
12/23/2014|2/20/1995|3/25/1905|josh

输出：

2014-12-23,1995-02-20,1905-03-25,josh

如果这不是您想要的输出，则描述您想要获得的确切输出。

几点：

while (<DATA>)从DATA读取一行。将其分配给有意义的变量，使代码更易于阅读。
在空行上跳过处理
不要成为LTS的受害者：/\|/比qr{\|}甚至qr{ \| }x更难区分。
reverse使代码更易于阅读，但如果您有大量字段，它们可能会成为真正的瓶颈。在这种情况下，pop和push。

Answer 2

您可以使用正则表达式来帮助匹配数据线的各个部分，并取消多个分割。使用正则表达式还可以验证您的行格式。你有3个月的月份吗？你有三个约会吗？验证您的输入总是一个好主意：

#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);

my $date_re = qr(
        ^(?<month1>\d{1,2})/
        (?<day1>\d{1,2})/
        (?<year1>\d{2,4})
        \|                    # Separator between date1 and date2
        (?<month2>\d{1,2})/
        (?<day2>\d{1,2})/
        (?<year2>\d{2,4})
        \|                    # Separator between date2 and date3
        (?<month3>\d{1,2})/
        (?<day3>\d{1,2})/
        (?<year3>\d{2,4})
        \|                    # Separator between date3 and name
        (?<name>.*)
    )x;
while ( my $line = <DATA> ) {
    my @array;
    if ( not @array = $line =~ m^$date_re^ ) {
        say "Something's wrong";
    }
    else {
        say "First Date: Year = $+{year1}  Month = $+{month1}  Day = $+{day1}";
        say "Second Date: Year = $+{year2}  Month = $+{month2}  Day = $+{day2}";
        say "Third Date: Year = $+{year3}  Month = $+{month3}  Day = $+{day3}";
        say "Name = $+{name}";
    }
}

__DATA__
12/23/2014|2/20/1995|3/25/1905|josh

运行此程序打印出来：

First Date: Year = 2014  Month = 12  Day = 23
Second Date: Year = 1995  Month = 2  Day = 20
Third Date: Year = 1905  Month = 3  Day = 25
Name = josh

这是使用正则表达式的一些高级功能：

qr/.../可用于定义正则表达式。由于你在正则表达式中有斜杠，我决定用括号来分隔我的正则表达式，所以它是qr(...)。
最后的)x意味着我可以使用空格来使我的正则表达式更容易理解。例如，我将每个日期分成三行（月，日，年）。
(?<name>...)为您的捕获组命名，这样可以更轻松地引用回特定的捕获组。我可以使用%+哈希来回忆我的捕获组。例如(?<month1>\d{1,2})表示我预计会有1到2个月的月份。我将其存储在捕获组 month1 中，我可以使用$+{month1}来参考此内容。

使用命名捕获组的一个好处是它文档您尝试捕获的内容。
{M,N}是重复。我希望以前的正则表达式发生在M到N次。 \d{1,2}表示我期待一两位数。

Answer 3

您必须保持字段和连接的分割，但您可以将替换减少为：

$dates[2] =~ s/^([01]\d)$/20$1/;
$dates[2] =~ s/^([2-9]\d)$/19$1/;
$dates[1] =~ s/^(\d)$/0$1/;
$dates[0] =~ s/^(\d)$/0$1/;

Answer 4

它很丑陋，但是它会按照您的要求一次性完成所有字段。

while (<DATA>) {
    s/
      (?:^|\|)\K # start after a leading start-of-line or pipe
      (\d{1,2})
      [\/-]
      (\d{1,2})
      [\/-]
      (\d\d(?:\d\d)?)
      (?=\||\z) # look-ahead to see trailing pipe or end-of-string
     /
        sprintf('%04d-%02d-%02d',
            $3 <  20 ? $3 + 2000
          : $3 < 100 ? $3 + 1900
          : $3,
            $1,
            $2
        )
     /gex;
    print;

}

__DATA__
1/23/14|2/20/95|3/25/1905|josh

修改数组perl中的多个元素

4 个答案: