我的字符串是:
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE
AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE
....
我需要使用awk或...或perl函数之类的标准linux命令获取此格式
AA,BB,C,DD,E
aaaaaaaa,bbbbbb,ccccc,dddddd,eeeee
aaaaaaaa2,bbbbbb2,ccccc2,dddddd2,eeeee2
exm:
OUTPUT_STRING | awk ....
或
perlFunction(OUTPUT_STRING){
.....
返回formated_string;
}
我搜索了google,并在更多网站上尝试了许多帮助,但无法正常工作,因此请不要向我发送链接
某些字段具有单个:,而某些字段具有 double:(这是随机的)
我尝试一些帮助但对我没有帮助
sed -r 's/\\,|,|CN=|OU*//g' |awk -F "|=|:" '{printf $2"|"}'
要么
sed -n '1h; 2,$H;${g;s/\n/,/g;p}' | sed 's/,,/\n/g'
要么
awk -F ":" '{printf $2} {if (NF==0) {printf "\n"}}' | sed "s/ //" | sed "s/ /;/g"
答案 0 :(得分:4)
获得理想结果的多种方法之一
use strict;
use warnings;
my $file = do { local $/; <DATA> }; # read whole file
my @blocks = split /\n\n/, $file; # split file into blocks
my $print_header = 1; # flag to print header
foreach my $block (@blocks) { # process each block
$block =~ s/:+/:/g; # clean up the block :: -> :
my @lines = split /\n/, $block; # split the block into lines
my(@header,@data); # arrays to store header and data
foreach my $line (@lines) { # process each line
my($h,$d) = split /:\s*/, $line; # split line into header and data part
push @header, $h; # add header names into array
push @data, $d; # add data into array
}
if( $print_header ){ # if header not printed yet
print join(',', @header) . "\n"; # print header array
$print_header = 0; # flag the header is printed
}
print join(',', @data) . "\n"; # print data array
}
__DATA__
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE
AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE2
输出
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE2
答案 1 :(得分:2)
此gnu awk
应该这样做:
awk -v RS='' -F':* ?|\n' 'NR==1{print $1","$3","$5","$7","$9} {print $2","$4","$6","$8","$10}' t
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
RS=''
未将记录选择器设置为任何内容,因此awk可在块模式下工作。
-F':* ?|\n'
将字段分隔符设置为:
或::
或换行符NR==1{print $1","$3","$5","$7","$9}
第一行打印标题{print $2","$4","$6","$8","$10}
打印数据字段。更通用的解决方案,可以在更多字段中使用:
awk -v RS='' -F':* ?|\n' 'NR==1{for(i=1;i<=NF-2;i+=2) printf "%s,",$i;print $i} {for(i=2;i<=NF-2;i+=2) printf "%s,",$i;print $i}' file
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
PS如果不是所有记录都具有全部ID,则要编写整个故事。
答案 2 :(得分:0)
使用Text::CSV处理edge cases:
use strict;
use warnings;
use Text::CSV 'csv';
my $input = do { local $/; readline }; # input from STDIN or filename argument
my @aoh;
my %headers;
foreach my $block (split /\n\n+/, $input) {
my %row;
foreach my $line (split /^/, $block) {
if ($line =~ m/^([^:]+):+\s*(.*)$/) {
$row{$1} = $2;
$headers{$1} = 1;
}
}
push @aoh, \%row;
}
csv(in => \@aoh, out => *STDOUT, headers => [sort keys %headers],
encoding => 'UTF-8', auto_diag => 2);