将“丑陋”的csv转换为“漂亮”的csv时遇到问题。 例如,我有:
something,epochtime,time-human-readable,some,header,for,the,values,here
same,time-a,don-t_care,a,b,,,,
same,time-a,don-t_care,,,,,c,
same,time-a,don-t_care,,,,,,d
same,time-a,don-t_care,,,e,f,,
same,time-b,don-t_care,g,h,,,,
same,time-b,don-t_care,,,i,j,,
same,time-b,don-t_care,,,,,,k
same,time-b,don-t_care,,,,,l,
same,time-c,don-t_care,,,m,n,,
same,time-c,don-t_care,,,,,o,
same,time-c,don-t_care,p,q,,,,
same,time-c,don-t_care,,,,,,r
但是我需要的是:
something,epochtime,time-human-readable,some,header,for,the,values,here
same,time-a,don-t_care,a,b,e,f,c,d
same,time-b,don-t_care,g,h,i,j,l,k
same,time-c,don-t_care,p,q,m,n,o,r
数据行为:
我尝试使用sed / awk以有限的技能解决此问题,但无济于事。
欢迎使用任何可以由crontab执行的解决方案,但首选bash / sed / awk / perl / python或任何具有“ apt-get install ...”功能的命令行工具。主机操作系统是XUbuntu 16.04 LTS。
附录:(2018-10-16 13:55 UTC)
-
或_
组成,没有空格或,
->没有字符串头疼dummy,1539697764,2018-10-16_13-49-24,p,q,,,,
答案 0 :(得分:2)
$ cat tst.awk
BEGIN { FS=OFS="," }
$2 != prev { if (NR>1) prt(); prev=$2 }
{
for (i=1; i<=NF; i++) {
if ($i != "") {
rec[i] = $i
}
}
}
END { prt() }
function prt() {
for (i=1; i<=NF; i++) {
printf "%s%s", rec[i], (i<NF ? OFS : ORS)
}
delete rec
}
$ awk -f tst.awk file
something,epochtime,time-human-readable,some,header,for,the,values,here
same,time-a,don-t_care,a,b,e,f,c,d
same,time-b,don-t_care,g,h,i,j,l,k
same,time-c,don-t_care,p,q,m,n,o,r
答案 1 :(得分:1)
Perl版本,使用CSV解析器,而不是对逗号进行幼稚的拆分,以便更加健壮-您提到的某些列是字符串,因此可以处理其中嵌入了逗号等的情况。
#!/usr/bin/perl
use strict;
use warnings;
# Install the following non-core modules through your
# OS package manager or favorite CPAN client.
use List::MoreUtils qw/pairwise/;
use Text::CSV;
my $csv = Text::CSV->new({ auto_diag => 2, blank_is_undef => 1 });
my $header = <>;
print $header;
my $merged = $csv->getline(\*ARGV);
while (my $cols = $csv->getline(\*ARGV)) {
if ($merged->[1] ne $cols->[1]) {
$csv->say(\*STDOUT, $merged);
$merged = $cols;
} else {
$merged = [ pairwise { $a // $b } @$merged, @$cols ];
}
}
$csv->say(\*STDOUT, $merged);
运行它:
$ perl merge.pl data.csv
something,epochtime,time-human-readable,some,header,for,the,values,here
same,time-a,don-t_care,a,b,e,f,c,d
same,time-b,don-t_care,g,h,i,j,l,k
same,time-c,don-t_care,p,q,m,n,o,r
答案 2 :(得分:0)
另一个Perl解决方案:
open $CSV, "<" , "ugly.csv";
@R=();
while (<$CSV>) {
if ($.==1 ) { print ; next; }
chomp;
@F=split(/,/,$_);
$k=join(",",@F[0..2]);
if( $k ne $prevk ) { @R=() }
push(@R,@F[3..9],"|");
$hash{"$k"}=join(",",@R);
$prevk=$k;
}
foreach $val (sort keys %hash)
{
@arr=split(/\|/,$hash{$val});
$x=join("",reverse sort @arr);
$x=~s/(^[,])|([,]{2,})/$1 eq "," ? "" : ","/eg;
print "$val,$x\n";
}
Shell输出:
$ perl -f ugly_csv.pl
something,epochtime,time-human-readable,some,header,for,the,values,here
same,time-a,don-t_care,a,b,e,f,c,d,
same,time-b,don-t_care,g,h,i,j,l,k,
same,time-c,don-t_care,p,q,m,n,o,r,