如何在unix中转置或旋转文本文件的数据?

时间:2019-06-25 09:27:14

标签: linux unix awk grep transpose

我在Unix中有一个输入文本文件,其中包含这种数据。

Event_date:20190512044638
Error_code:5858
Event_type:GPRS data
Duration:772
Missing_provider_id:46009

Event_date:20190512044638
Error_code:780678

Event_date:20190512064535
Error_code:5858
Event_type:GPRS data
Duration:2172
Missing_provider_id:722310

我希望此数据采用以下输出格式:

Event_date      Error_code  Event_type  Duration  Missing_provider_id
20190512044638  5858        GPRS data   772       46009
20190512044638  780678      
20190512064535  5858        GPRS data   2172      722310

我尝试了awk和sed命令的组合,但是没有解决。 我如何获得此输出?

Event_date:20190512044638
Error_code:5858
Event_type:GPRS data
Duration:772
Missing_provider_id:46009

Event_date:20190512044638
Error_code:780678

Event_date:20190512064535
Error_code:5858
Event_type:GPRS data
Duration:2172
Missing_provider_id:722310

我希望此数据采用以下输出格式:

Event_date      Error_code  Event_type  Duration  Missing_provider_id
20190512044638  5858        GPRS data   772       46009
20190512044638  780678      
20190512064535  5858        GPRS data   2172      722310

3 个答案:

答案 0 :(得分:1)

使用GNU awk和2D数组:

awk '
BEGIN {                         
    r=2                                           # data records in a start from 2
    FS=":"                                        # split at :
    OFS="\t"                                      # tab separated fields
    a[0][0]                                       # initialize a array
}
$0!="" {                                          # for nonempty records
    if(!($1 in a[0])) {                           # add keys to headers when needed
        a[0][$1]=++f                              # for lookups
        a[1][f]=$1                                # for printing
    }
    a[r][a[0][$1]]=$2                             # store value
    next
}
{                                                 # empty record -> new array record
    r++
}
END {                                             # after records are processed
    # delete a[0][0]                              # 
    for(i=1;i<=r;i++)                             # iterate records
        for(j=1;j<=f;j++)                         # iterate fields
            printf "%s%s",a[i][j],(j==f?ORS:OFS)  # output
}
' file | column -t -s $'\t'                       # column used for pretty-print

输出:

Event_date      Error_code  Event_type  Duration  Missing_provider_id
20190512044638  5858        GPRS data   772       46009
20190512044638  780678
20190512064535  5858        GPRS data   2172      722310

答案 1 :(得分:0)

awk可以执行以下操作:(使用制表符分隔的字段)

如果缺少字段,所有步骤都必须按顺序进行,否则PS将失败。

awk -F: 'NR==1 {print $1,$3,$5,$7,$9} {print $2,$4,$6,$8,$10}'  RS= ORS='\n' OFS='\t' file
Event_date      Error_code      Event_type      Duration        Missing_provider_id
20190512044638  5858    GPRS data       772     46009
20190512044638  780678
20190512064535  5858    GPRS data       2172    722310

更通用的解决方案:

awk -F: 'NR==1 {print $1,$3,$5,$7,$9} {for(i=2;i<=NF;i+=2) printf "%s\t",$i;print ""}'  RS= ORS='\n' OFS='\t' file
Event_date      Error_code      Event_type      Duration        Missing_provider_id
20190512044638  5858    GPRS data       772     46009
20190512044638  780678
20190512064535  5858    GPRS data       2172    722310

NR==1 {print $1,$3,$5,$7,$9}可以设置为某些静态标头,例如NR==1 {print "F1","F2","F3","F4","F5"}

答案 2 :(得分:0)

这是另一个

awk -F: -v RS= 'BEGIN   {OFS=FS}
                NR==FNR {for(i=1;i<NF;i+=2)
                           if(!($i in h)) {h[$i]; ho[++c]=$i}; 
                         next}
                FNR==1  {for(i=1;i<=c;i++) printf "%s",ho[i] (i==c?ORS:OFS)}
                        {delete v;
                         for(i=1;i<NF;i+=2) v[$i]=$(i+1);
                         for(i=1;i<=c;i++) printf "%s", v[ho[i]] (i==c?ORS:OFS)}' file{,} | 
column -ts:

Event_date      Error_code  Event_type  Duration  Missing_provider_id
20190512044638  5858        GPRS data   772       46009
20190512044638  780678
20190512064535  5858        GPRS data   2172      722310

没有2D数组,但是需要扫描文件两次以获取所有标头信息,以便不将任何数据保留在内存中,而是对出现的行进行处理。