我在Unix中有一个输入文本文件,其中包含这种数据。
Event_date:20190512044638
Error_code:5858
Event_type:GPRS data
Duration:772
Missing_provider_id:46009
Event_date:20190512044638
Error_code:780678
Event_date:20190512064535
Error_code:5858
Event_type:GPRS data
Duration:2172
Missing_provider_id:722310
我希望此数据采用以下输出格式:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
我尝试了awk和sed命令的组合,但是没有解决。 我如何获得此输出?
Event_date:20190512044638
Error_code:5858
Event_type:GPRS data
Duration:772
Missing_provider_id:46009
Event_date:20190512044638
Error_code:780678
Event_date:20190512064535
Error_code:5858
Event_type:GPRS data
Duration:2172
Missing_provider_id:722310
我希望此数据采用以下输出格式:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
答案 0 :(得分:1)
使用GNU awk和2D数组:
awk '
BEGIN {
r=2 # data records in a start from 2
FS=":" # split at :
OFS="\t" # tab separated fields
a[0][0] # initialize a array
}
$0!="" { # for nonempty records
if(!($1 in a[0])) { # add keys to headers when needed
a[0][$1]=++f # for lookups
a[1][f]=$1 # for printing
}
a[r][a[0][$1]]=$2 # store value
next
}
{ # empty record -> new array record
r++
}
END { # after records are processed
# delete a[0][0] #
for(i=1;i<=r;i++) # iterate records
for(j=1;j<=f;j++) # iterate fields
printf "%s%s",a[i][j],(j==f?ORS:OFS) # output
}
' file | column -t -s $'\t' # column used for pretty-print
输出:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
答案 1 :(得分:0)
此awk
可以执行以下操作:(使用制表符分隔的字段)
如果缺少字段,所有步骤都必须按顺序进行,否则PS将失败。
awk -F: 'NR==1 {print $1,$3,$5,$7,$9} {print $2,$4,$6,$8,$10}' RS= ORS='\n' OFS='\t' file
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
更通用的解决方案:
awk -F: 'NR==1 {print $1,$3,$5,$7,$9} {for(i=2;i<=NF;i+=2) printf "%s\t",$i;print ""}' RS= ORS='\n' OFS='\t' file
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
NR==1 {print $1,$3,$5,$7,$9}
可以设置为某些静态标头,例如NR==1 {print "F1","F2","F3","F4","F5"}
等
答案 2 :(得分:0)
这是另一个
awk -F: -v RS= 'BEGIN {OFS=FS}
NR==FNR {for(i=1;i<NF;i+=2)
if(!($i in h)) {h[$i]; ho[++c]=$i};
next}
FNR==1 {for(i=1;i<=c;i++) printf "%s",ho[i] (i==c?ORS:OFS)}
{delete v;
for(i=1;i<NF;i+=2) v[$i]=$(i+1);
for(i=1;i<=c;i++) printf "%s", v[ho[i]] (i==c?ORS:OFS)}' file{,} |
column -ts:
Event_date Error_code Event_type Duration Missing_provider_id
20190512044638 5858 GPRS data 772 46009
20190512044638 780678
20190512064535 5858 GPRS data 2172 722310
没有2D数组,但是需要扫描文件两次以获取所有标头信息,以便不将任何数据保留在内存中,而是对出现的行进行处理。