想要从第一列打印缺失的序列缺口(开始缺失序列,结束缺失序列) 然后需要打印Minimum&第一列的最大序列 以及$ 2,substr($ 3,4,6),substr($ 4,4,6),$ 6,$ 8,$ 10字段的组合。 输入文件未按第一列排序。
Input.csv
21,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
22,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
23,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
24,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
28,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
32,abc,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
38,abc,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
41,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
46,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
51,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
52,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
尝试过此命令并得到部分输出:
cat Input.csv | \
awk -F, '{OFS=","; print $1,$2,substr($3,4,6),substr($4,4,6),$6,$8,$10}' | \
sort -k1 -t, | \
awk -F, 'BEGIN {OFS=","} (($1!=p+1) && ($7==p7)) {print p,p2,p3,p4,p5,p6,p7,p+1 "," $1-1,$1} {p=$1;p2=$2;p3=$3;p4=$4;p5=$5;p6=$6;p7=$7}'
以上命令输出标题名称为:
Minimum Seq ($1),$2,substr($3,4,6),substr($4,4,6),$6,$8,$10,start Missing Seq ($1),End Missing Seq ($1),Maximum Seq ($1)
24,abc,JUN-12,JUN-12,1,RO0412,L7,25,27,28
32,abc,MAY-13,MAY-13,1350,RO0213,K1,33,37,38
41,abc,FEB-14,FEB-14,650,EN1113,S317,42,45,46
46,abc,FEB-14,FEB-14,650,EN1113,S317,47,50,51
在上面的输出中 - 最小Seq($ 1),最大Seq($ 1)值不正确我预期的结果,请帮助... 例如,打印输出中的第一行 - 最小seq应为21而不是24 打印输出中的第三行 - 最大seq应为52而不是46
期望的输出:
## $2,$3,$4,$6,$8,$10,"start Missing Seq ($1), ",End Missing Seq ($1) ,Minimum Seq ($1),Maximum Seq ($1) ##
abc,JUN-12,JUN-12,1,ROTN0412,L7,25,27,21,28
abc,MAY-13,MAY-13,1350,ROTN0213,K1,33,37,32,38
abc,FEB-14,FEB-14,650,CHEN1113,S317,42,45,41,52
abc,FEB-14,FEB-14,650,CHEN1113,S317,47,50,41,52
答案 0 :(得分:0)
您可以尝试以下perl脚本:
#! /usr/bin/perl
use warnings;
use strict;
use File::Slurp qw(read_file);
use List::Util qw(min max);
my @lines=read_file('input.csv');
my $ll=sortLines(\@lines);
$ll=reduceFields($ll);
my $rr=findRanges($ll);
printMissingSeqs($rr,$ll);
sub printMissingSeqs {
my ($rr,$ll) = @_;
my $pkey=""; my $pss; my $i=0;
for (@$ll) {
my @f=split(/,/);
my $key=$f[6];
my $ss=$f[0];
$pss=$ss if $i==0;
if (($key eq $pkey) && ($ss-$pss)>1) {
print join(",",(@f[1..6], $pss+1,$ss-1,@{$rr->{$key}}))."\n";
}
$pkey=$key; $pss=$ss;
$i++;
}
}
sub findRanges {
my ($ll) = @_;
my %temp;
my %rr;
for (@$ll) {
my @f=split(/,/);
push (@{$temp{$f[6]}},$f[0]);
}
for (keys %temp) {
my $min=min(@{$temp{$_}});
my $max=max(@{$temp{$_}});
$rr{$_}=[$min, $max];
}
return \%rr;
}
sub reduceFields {
my ($ll) = @_;
my @a;
for (@$ll) {
my @f=split(/,/);
my $line=join(",",($f[0],$f[1],substr($f[2],3,6),substr($f[3],3,6),$f[5],$f[7],$f[9]));
push (@a,$line);
}
return \@a;
}
sub sortLines {
my ($lines) = @_;
my @a=sort { my ($keyA)=$a=~/(.*?),/; my ($keyB)=$b=~/(.*?),/; $keyA<=>$keyB} @$lines;
return \@a;
}
输出:
abc,JUN-12,JUN-12,1,RO0412,L7,25,27,21,28
abc,MAY-13,MAY-13,1350,RO0213,K1,33,37,32,38
abc,FEB-14,FEB-14,650,EN1113,S317,42,45,41,52
abc,FEB-14,FEB-14,650,EN1113,S317,47,50,41,52