我怎么能:
将固定长度文件转换为csv文件。
根据列长度从输入文件(固定长度文件)中拆分记录。
我尝试使用'awk'转换文件,但由于记录中的空格,结果不正确。
输入文件:
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
004009164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
004009164
第一条记录从 4002000W1ABCDABCD7821 开始,结束于 的 00409164
共有6条记录。
输入文件包含表的6条记录。
记录中有40多列,我只提到了其中的一些。
列的长度固定如下:
ABC_ID(9), def_sc(8), sde_hd(8),mln_hfg(12), ghi_jkl(13),ijk_klm(6),pqr_xyz(10)
预期输出如下:
输出文件:
ABC_ID, def_sc, sde_hd, mln_hfg, ghi_jkl, ijk_klm, pqr_xyz
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
使用sed命令是否可以实现。
请建议。
答案 0 :(得分:1)
它并不完全清楚你想要什么,但是对于FIELDWIDTHS和多字符RS的GNU awk是一个选项:
$ awk -v RS='^$' -v FIELDWIDTHS="9 8 8 8" -v OFS=', ' '{gsub(/\n/,""); print $1, $2, $3, $4}' file
4002000W1, ABCDABCD, 78211234, 56789071
答案 1 :(得分:1)
awk -v FIELDWIDTHS="9 8 8 12 13 6 10" 'NR%2{temp=$0;next;} {$0=temp$0; gsub(/ /,""); print $1,$2,$3,$4,$5,$6,$7}' OFS=',' file
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
要添加第一行,首先只需在BEGIN{...}
内打印:
awk -v FIELDWIDTHS="9 8 8 12 13 6 10" 'BEGIN{print "ABC_ID, def_sc, sde_hd, mln_hfg, ghi_jkl, ijk_klm, pqr_xyz"} NR%2{temp=$0;next;} {$0=temp$0; gsub(/ /,""); print $1,$2,$3,$4,$5,$6,$7}' OFS=',' file
FIELDWIDTHS="9 8 8 12 13 6 10"
指定要打印的字段的长度。NR%2{temp=$0;next;}
在temp
变量中存储奇数行(将用于加入这对行)$0=temp$0
加入每个连续的行。 $0
是当前行,temp
是当前行之前的行。gsub(/ /,"");
删除空格字符。print $1,$2,$3,$4,$5,$6,$7
按FIELDWIDTHS
答案 2 :(得分:0)
这是一个Perl解决方案:
use strict;
use warnings;
my @fmt = (9, 8, 8, 12, 13, 6, 10);
my @head = qw(ABC_ID def_sc sde_hd mln_hfg ghi_jkl ijk_klm pqr_xyz);
my $rec_len = do { my $sum; for(@fmt) { $sum += $_ }; $sum };
my $fn = 'file';
open(my $fh, '<', $fn) or die "Could not open file '$fn': $!\n";
my $str = do {local $/ = undef; <$fh>};
close($fh);
$str =~ s/\s*//g;
my $regex = join ("", map { "(.{$_})" } @fmt);
my $head_fmt = join (", ", map { "%-". $_ . "s", } @fmt) . "\n";
printf $head_fmt, @head;
while ( $str =~ /(.{$rec_len})/g ) {
my @f = $1 =~ /$regex/;
print join(", ", @f) . "\n";
}
输出:
ABC_ID , def_sc , sde_hd , mln_hfg , ghi_jkl , ijk_klm, pqr_xyz
4002000W1, ABCDABCD, 78211234, 5671LSN12301, 6300000000000, 009164, 0004091644
002000W1A, BCDABCD7, 82112345, 671LSN123016, 3000000000000, 091640, 0040916440
02000W1AB, CDABCD78, 21123456, 71LSN1230163, 0000000000000, 916400, 0409164400
2000W1ABC, DABCD782, 11234567, 1LSN12301630, 0000000000009, 164000, 4091644002
000W1ABCD, ABCD7821, 12345671, LSN123016300, 0000000000091, 640004, 0091644002