Perl - 查找,格式化和替换文件中的字符串

时间:2014-11-10 22:35:04

标签: perl replace find format substring

我有一个csv文件,其中我存储格式为 h:m:s 的时间,我希望将这些时间转换为代表总秒数的数字。示例如果我有 1:02:34 我希望将其替换为1 * 3600 + 2 * 60 + 34 = 3754

我要做的是以下内容:

  1. 以h:m:s
  2. 格式查找所有时间
  3. 执行我们以秒为单位的格式化操作
  4. 将格式为h:m:s的时间替换为格式化的数字(以秒为单位)
  5. 当然,我想在浏览文件一次时进行所有这些更改。但我仍然坚持用格式化的变量部分替换和写回文件。 如果有人能指出我正确的方向,那将非常感激。如果一次性这样做是可能的。

    谢谢, CJ

    这是数据的外观:

    Column,Column,Column,Column,Column,Column,Column,Column,Column
    1408319018,0:0:28,0:00:00,0:01:00,0:00:00,0:06:16,NA:NA:NA,0:07:32,0:8:0
    1408313536,0:2:6,0:00:01,0:01:00,0:00:00,0:06:20,NA:NA:NA,0:07:40,0:9:46
    1408319031,0:0:24,0:00:00,0:01:07,0:00:00,0:07:06,NA:NA:NA,0:08:30,0:8:54
    1408319018,0:2:21,0:00:01,0:00:54,0:00:00,0:00:37,NA:NA:NA,0:01:51,0:4:12
    1408319037,1:51:13,0:00:01,0:01:13,0:00:01,0:18:09,NA:NA:NA,0:19:41,2:10:54
    1408319031,1:58:18,0:00:01,0:00:55,0:00:00,0:00:18,NA:NA:NA,0:01:30,1:59:48
    

    这就是我的代码到目前为止的样子:

    #!/usr/bin/perl
    
    use strict;
    #use warnings;
    
    my $line;
    my $file = "bla.csv";
    my ($formatTime0,$formatTime1,$formatTime2,$formatTime3,$formatTime4,$formatTime5,$formatTime6);
    
    open(my $OUTPUT, '+<'. $file);
    
    while( $line = <$OUTPUT> ) {
    
        $formatTime0 = formatTime( ($line =~ /,(\d:\d*:\d*)/g)[0] );
        $formatTime1 = formatTime( ($line =~ /,(\d:\d*:\d*)/g)[1] );
        $formatTime2 = formatTime( ($line =~ /,(\d:\d*:\d*)/g)[2] );
        $formatTime3 = formatTime( ($line =~ /,(\d:\d*:\d*)/g)[3] );
        $formatTime4 = formatTime( ($line =~ /,(\d:\d*:\d*)/g)[4] );
        $formatTime5 = formatTime( ($line =~ /,(\d:\d*:\d*)/g)[5] );
        $formatTime6 = formatTime( ($line =~ /,(\d:\d*:\d*)/g)[6] );
    
        print $formatTime0."\t".$formatTime1."\t".$formatTime2."\t".$formatTime3."\t".$formatTime4."\t".$formatTime5."\t".$formatTime6."\n";
    }
    
    close $OUTPUT;
    
    sub formatTime {
        my $time2format = $_[0];
    
        my (@temp)  = ($time2format =~ /(\d).*(\d\d).*(\d\d)/);
    
        my $seconds = $temp[2];
        my $minutes = $temp[1];
        my $hours   = $temp[0];
    
        if ($minutes > 0) {
            $minutes = $minutes * 60;
        }
        if ($hours > 0) {
            $hours = $hours * 3600;
        }
    
        my $timeINsec = $hours + $minutes + $seconds;
        return $timeINsec;
    }
    

2 个答案:

答案 0 :(得分:3)

此代码使用可执行替换字符串来计算每个时间字段的秒数。

设置$^I = '.orig'会使Perl在原始文件的备份中保留一个名称相同但附加.orig的文件。

程序期望输入文件的路径作为命令行上的参数,因此它应该像这样运行

perl format_time.pl mydata.txt

use strict;
use warnings;

$^I = '.orig';

while (<>) {
  s{ \b (\d{1,2}) : (\d{1,2}) : (\d{1,2}) \b }{ ($1 * 60 + $2) * 60 + $3 }gxe;
  print;
}

<强>输出

Column,Column,Column,Column,Column,Column,Column,Column,Column
1408319018,28,0,60,0,376,NA:NA:NA,452,480
1408313536,126,1,60,0,380,NA:NA:NA,460,586
1408319031,24,0,67,0,426,NA:NA:NA,510,534
1408319018,141,1,54,0,37,NA:NA:NA,111,252
1408319037,6673,1,73,1,1089,NA:NA:NA,1181,7854
1408319031,7098,1,55,0,18,NA:NA:NA,90,7188

答案 1 :(得分:-1)

我建议使用一个函数将你的一个元组变成你想要的。

然后切掉开头的数字,让这个函数在每个元组上完成它的工作。

以下是我的看法:

open my $out, "file.txt";
my @lines;
while ( my $line = <$out> ){    

    next unless $line =~s /^\d+,//; # remove beginning number, skip Column line

    my @tuples = split( ",",$line ); # I kept the N/A values, to discard:
    # my @tuples = grep{ $_ !~ /[a-z]/i }split( ",",$input );

    @tuples = map { tuple_to_seconds( $_ ) }@tuples;

    push @lines, join(",", @tuples ); 
    # I printed with ",", choose what you like best
}
close $out;
open $out, ">file.txt";
print $out join("\n", @lines );
close $out;

sub tuple_to_seconds {
    # takes a tuple and returns N/A for N/A, seconds for a valid number tuple
    my $tuple = shift;
    return "N/A" if $tuple =~ /[a-z]/i;
    my ( $h,$m,$s ) = split( ":", $tuple );

    return $h*3600+$m*60+$s;
}