改进计算天数命令

时间:2014-12-26 17:13:58

标签: unix awk

想生成报告,其中计算天数,物料在仓库中。 天数是材料进入的日期($3 field)之间的差异 反对(01 OCT 2014)手动Feed日期。

Input.csv

Des11,Material,DateIN,Des22,Des33,MRP,Des44,Des55,Des66,Location,Des77,Des88
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,yyy,05-FEB-14.09:02:09,cc,dd,x20,ee,ff,gg,YY250,hh,jj
aa,yyy,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj

目前我正在使用以下命令来平衡人口老化 - 在13美元的领域(thanks to gboffi)没有天数

awk -F, 'NR>0  {date=$3;
                  gsub("[-.]"," ",date);
                  printf $0 ",";system("date --date=\"" date "\" +%s")}
  '  Input.csv | awk -F, -v OFS=, -v now=`date --date="01 OCT 2014 " +%s` '
                  NR>0  {$13=now-$13; $13=$13/24/3600;print $0}' >Op_Step11.csv

在Cygwin(windows)中使用上述命令时,它会采用50 minutes for 1 Lac (1,00,000)行样本输入。 由于我的实际输入文件包含25 million rows of lines,因此脚本似乎需要几天时间, 寻找你的建议,以改善命令和建议!!!

预期产出:

Des11,Material,DateIN,Des22,Des33,MRP,Des44,Des55,Des66,Location,Des77,Des88,Ageing-NoOfDays
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj,42.6611
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj,42.6611
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj,109.621
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj,109.621
aa,yyy,05-FEB-14.09:02:09,cc,dd,x20,ee,ff,gg,YY250,hh,jj,237.624
aa,yyy,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj,237.624
aa,zzz,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj,237.624
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj,476.787
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj,476.787
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj,476.787

我无权更改输入格式,也没有perl& amp; python访问。

更新#3:

BEGIN{ FS=OFS=","} 
{ 
t1=$3
t2="01-OCT-14.00:00:00"
print $0,(cvttime(t2) - cvttime(t1))/24/3600
}

function cvttime(t,     a) {
        split(t,a,"[-.:]")
        match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",a[2])
        a[2] = sprintf("%02d",(RSTART+2)/3)
        return( mktime("20"a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]) )
}

2 个答案:

答案 0 :(得分:2)

由于您使用的是cygwin,因此您使用的是GNU awk,它具有自己的内置时间函数,因此您无需尝试使用shell date命令。只需调整我所说的旧命令以适合您的输入和输出格式:

function cvttime(t,     a) {
        split(t,a,"[/:]")
        match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])
        a[2] = sprintf("%02d",(RSTART+2)/3)
        return( mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]) )
}
BEGIN{
t1="01/Dec/2005:00:04:42"
t2="01/Dec/2005:17:14:12"
print cvttime(t2) - cvttime(t1)
}

它使用GNU awk作为时间函数,请参阅http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions

答案 1 :(得分:0)

这是Perl中的一个例子:

use feature qw(say);
use strict;
use warnings;

use Text::CSV;
use Time::Piece;

my $csv = Text::CSV->new;
my $te = Time::Piece->strptime('01-OCT-14', '%d-%b-%y');
my $fn = 'Input.csv';
open (my $fh, '<', $fn) or die "Could not open file '$fn': $!\n";
chomp(my $head = <$fh>);
say "$head,Ageing-NoOfDays";
while (my $line = <$fh>) {
    chomp $line;
    if ($csv->parse($line)) {
        my $t = ($csv->fields())[2];
        my $tp = Time::Piece->strptime($t, '%d-%b-%y.%T');
        my $s = $te - $tp;
        say "$line," . $s->days;
    } else {
        warn "Line could not be parsed: $line\n";
    }
}
close($fh);