我有一个用逗号分隔的文件,正在格式化该文件,以使用printf创建2列。我正在使用awk将内容分组到相似的组中,以便可以将它们打印到格式正确的列中。
格式化有效,但是数组的内容会换行,而不是换行。
输入文件示例:
1,test,test1,test1
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2
2,test,test1,test2`
使用的命令:
awk -F"," 'NR>1 {a[$3]=a[$3] ? a[$3]", "$4" ("$2")" : $4" ("$2")"}
END {for (i in a) {print i":"a[i]}}' test.dat |
sort |
awk -F":" 'BEGIN { printf "%-15s %-10s\n", "COLUMN1","COLUMN2"; printf "%-15s %-10s\n", "-----------","----------"}
{ printf "%-15s %-10s\n", $1,$2}'
我也知道并尝试使用column -t -s","
和pr
结果类似于(模拟示例):
COLUMN1 COLUMN2
======== =======
1 test1
2 test2, test2, test2, test2, test2, test2,test2, test2, test2,test2, test2, test2, test2, test2
如何包装第二列(如果它太长,即使第一列也是如此)以使其适合其框架?
COLUMN1 COLUMN2
======== =======
1 test1
2 test2, test2, test2, test2, test2, test2,test2, test2,
test2,test2, test2, test2, test2, test2
答案 0 :(得分:2)
假设您发布的示例输入和您说得到的输出,就假装这是您原始脚本正在做的事情:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
print "column1", "column2"
print "=======", "======="
for (key in vals) {
print key, vals[key]
}
}
$ awk -f tst.awk file
column1 column2
======= =======
1 test1
2 test2, test2, test2, test2, test2, test2, test2, test2, test2, test2, test2, test2
这是您提出问题的一个很好的起点,现在您想包装每一列吗?如果是这样,那么我将利用fold
或fmt
之类的现有UNIX工具为您进行包装,这样您就不必编写自己的代码来处理空格和中间空格的拆分。单词等:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
print "column1", "column2"
print "=======", "======="
for (key in vals) {
numKeyLines = wrap(key,15,keyArr)
numValLines = wrap(vals[key],50,valArr)
numLines = (numKeyLines > numValLines ? numKeyLines : numValLines)
for (lineNr=1; lineNr<=numLines; lineNr++) {
print keyArr[lineNr], valArr[lineNr]
}
}
}
function wrap(inStr,wid,outArr, cmd,line,numLines) {
if ( length(inStr) > wid ) {
cmd = "printf \047%s\n\047 \"" inStr "\" | fold -s -w " wid+0
while ( (cmd | getline line) > 0 ) {
outArr[++numLines] = line
}
close(cmd)
}
else {
outArr[++numLines] = inStr
}
return numLines+0
}
。
$ awk -f tst.awk file
column1 column2
======= =======
1 test1
2 test2, test2, test2, test2, test2, test2, test2,
test2, test2, test2, test2, test2
如果您有很多需要包装的字段,那么由于每次调用fold
都会生成一个子shell,所以它不会很快,所以这里是一个全awk版本,如果可能,请在空格处拆分,测试适用于边缘情况和按摩以适合:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ vals[$1] = ($1 in vals ? vals[$1] ", " : "") $4 }
END {
print "column1", "column2"
print "=======", "======="
for (key in vals) {
numKeyLines = wrap(key,15,keyArr)
numValLines = wrap(vals[key],50,valArr)
numLines = (numKeyLines > numValLines ? numKeyLines : numValLines)
for (lineNr=1; lineNr<=numLines; lineNr++) {
print keyArr[lineNr], valArr[lineNr]
}
}
}
function wrap(inStr,wid,outArr, lineEnd,numLines) {
while ( length(inStr) > wid ) {
lineEnd = ( match(substr(inStr,1,wid),/.*[[:space:]]/) ? RLENGTH - 1 : wid )
outArr[++numLines] = substr(inStr,1,lineEnd)
inStr = substr(inStr,lineEnd+1)
sub(/^[[:space:]]+/,"",inStr)
}
outArr[++numLines] = inStr
return numLines
}
$ awk -f tst.awk file
column1 column2
======= =======
1 test1
2 test2, test2, test2, test2, test2, test2, test2,
test2, test2, test2, test2, test2
答案 1 :(得分:0)
以下是使用perl而不是awk的版本:
#!/usr/bin/env perl
use warnings;
use strict;
my ($col1, $col4, @col4data);
print <<EOF;
COLUMN1 COLUMN2
======= =======
EOF
{
my $line = <>;
chomp $line;
($col1, $col4data[0]) = (split /,/, $line)[0,3];
}
while (<>) {
chomp;
my ($c, $a) = (split /,/)[0,3];
if ($c ne $col1) {
$col4 = join ", ", @col4data;
write;
@col4data = ();
$col1 = $c;
}
push @col4data, $a;
}
$col4 = join ", ", @col4data;
write;
format STDOUT =
@<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<
$col1, $col4
~~ ^<<<<<<<<<<<<<<<<<<<<<<
$col4
.
示例:
$ perl columns.pl input.csv
COLUMN1 COLUMN2
======= =======
1 test1
2 test2, test2, test2,
test2, test2, test2,
test2, test2, test2,
test2, test2, test2
这里的魔力在于使用output format的填充模式进行换行。通过在<
说明的明显部分中添加更多format
来根据需要调整宽度。