如何将该格式字符串转换为CSV?

时间:2019-12-02 05:59:34

标签: linux perl awk

我的字符串是:

AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE

AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE

....

我需要使用awk或...或perl函数之类的标准linux命令获取此格式

AA,BB,C,DD,E
aaaaaaaa,bbbbbb,ccccc,dddddd,eeeee
aaaaaaaa2,bbbbbb2,ccccc2,dddddd2,eeeee2
  

exm:
OUTPUT_STRING | awk ....
  或
perlFunction(OUTPUT_STRING){
.....
  返回formated_string;
}

我搜索了google,并在更多网站上尝试了许多帮助,但无法正常工作,因此请不要向我发送链接

某些字段具有单个:,而某些字段具有 double:(这是随机的)

我尝试一些帮助但对我没有帮助

sed -r 's/\\,|,|CN=|OU*//g' |awk -F "|=|:" '{printf $2"|"}'  要么 sed -n '1h; 2,$H;${g;s/\n/,/g;p}' | sed 's/,,/\n/g'  要么 awk -F ":" '{printf $2} {if (NF==0) {printf "\n"}}' | sed "s/ //" | sed "s/ /;/g"

3 个答案:

答案 0 :(得分:4)

获得理想结果的多种方法之一

use strict;
use warnings;

my $file = do { local $/; <DATA> };         # read whole file
my @blocks = split /\n\n/, $file;           # split file into blocks

my $print_header = 1;                       # flag to print header

foreach my $block (@blocks) {               # process each block
    $block =~ s/:+/:/g;                     # clean up the block :: -> :

    my @lines = split /\n/, $block;         # split the block into lines
    my(@header,@data);                      # arrays to store header and data

    foreach my $line (@lines) {             # process each line
        my($h,$d) = split /:\s*/, $line;    # split line into header and data part
        push @header, $h;                   # add header names into array
        push @data, $d;                     # add data into array
    }

    if( $print_header ){                    # if header not printed yet
        print join(',', @header) . "\n";    # print header array
        $print_header = 0;                  # flag the header is printed 
    }

    print join(',', @data)   . "\n";        # print data array
}

__DATA__
AA:: aaaaaaaaaaaaaaaaa
BB:: bbbbbbbbbbbbbbbb
C: ccccccccccccccccc
DD:: DDDDDDDDDDDD
E: EEEEEEEEEEEEE

AA: aaaaaaaaaaaaaaaaa2
BB:: bbbbbbbbbbbbbbbb2
C:: ccccccccccccccccc2
DD: DDDDDDDDDDDD2
E: EEEEEEEEEEEEE2

输出

AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE2

答案 1 :(得分:2)

gnu awk应该这样做:

awk -v RS='' -F':* ?|\n' 'NR==1{print $1","$3","$5","$7","$9} {print $2","$4","$6","$8","$10}' t
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE
  • RS=''未将记录选择器设置为任何内容,因此awk可在块模式下工作。
  • -F':* ?|\n'将字段分隔符设置为:::或换行符
  • NR==1{print $1","$3","$5","$7","$9}第一行打印标题
  • {print $2","$4","$6","$8","$10}打印数据字段。

更通用的解决方案,可以在更多字段中使用:

awk -v RS='' -F':* ?|\n' 'NR==1{for(i=1;i<=NF-2;i+=2) printf "%s,",$i;print $i} {for(i=2;i<=NF-2;i+=2) printf "%s,",$i;print $i}' file
AA,BB,C,DD,E
aaaaaaaaaaaaaaaaa,bbbbbbbbbbbbbbbb,ccccccccccccccccc,DDDDDDDDDDDD,EEEEEEEEEEEEE
aaaaaaaaaaaaaaaaa2,bbbbbbbbbbbbbbbb2,ccccccccccccccccc2,DDDDDDDDDDDD2,EEEEEEEEEEEEE

PS如果不是所有记录都具有全部ID,则要编写整个故事。

答案 2 :(得分:0)

使用Text::CSV处理edge cases

use strict;
use warnings;
use Text::CSV 'csv';

my $input = do { local $/; readline }; # input from STDIN or filename argument

my @aoh;
my %headers;
foreach my $block (split /\n\n+/, $input) {
  my %row;
  foreach my $line (split /^/, $block) {
    if ($line =~ m/^([^:]+):+\s*(.*)$/) {
      $row{$1} = $2;
      $headers{$1} = 1;
    }
  }
  push @aoh, \%row;
}

csv(in => \@aoh, out => *STDOUT, headers => [sort keys %headers],
  encoding => 'UTF-8', auto_diag => 2);