AWK文件重新格式化

时间:2016-06-02 10:56:17

标签: linux shell awk scripting

我正在努力使用awk重新格式化逗号分隔文件。该文件包含多个服务器和多个指标的一天的分钟数据 例如,每服务器2条记录,每分钟24小时

示例输入文件:

server01,00:01:00,AckDelayAverage,9999  
server01,00:01:00,AckDelayMax,8888  
server01,00:02:00,AckDelayAverage,666  
server01,00:02:00,AckDelayMax,5555  
.....  
server01,23:58:00,AckDelayAverage,4545  
server01,23:58:00,AckDelayMax,8777  
server01,23:59:00,AckDelayAverage,4686  
server01,23:59:00,AckDelayMax,7820  
server02,00:01:00,AckDelayAverage,1231  
server02,00:01:00,AckDelayMax,4185  
server02,00:02:00,AckDelayAverage,1843  
server02,00:02:00,AckDelayMax,9982  
.....  
server02,23:58:00,AckDelayAverage,1022  
server02,23:58:00,AckDelayMax,1772  
server02,23:59:00,AckDelayAverage,1813  
server02,23:59:00,AckDelayMax,9891  

我尝试将文件重新格式化为每分钟有一行,并且字段1和1的唯一串联3作为列标题

例如,预期的输出文件如下所示:

Minute, server01-AckDelayAverage,server01-AckDelayMax, server02-AckDelayAverage,server02-AckDelayMax  

00:01:00,9999,8888,1231,4185  
00:02:00,666,5555,1843,8892  
...  
...  
23:58:00,4545,8777,1022,1772  
23:59:00,4686,7820,1813,9891  

3 个答案:

答案 0 :(得分:0)

使用GNU awk的解决方案。将其称为awk -F, -f script input_file

/Average/ { average[$2, $1] = $4; }
/Max/ { maximum[$2, $1] = $4; }
{
    if (!($2 in minutes)) {
        minutes[$2] = 1;
    }
    if (!($1 in servers)) {
        servers[$1] = 1;
    }
}
END {
    mcount = asorti(minutes, smin);
    scount = asorti(servers, sserv);
    printf "minutes";
    for (col = 1; col <= scount; col++) {
        printf "," sserv[col] "-average," sserv[col] "-maximum";
    }
    print "";
    for (row = 1; row <= mcount; row++) {
        key = smin[row];
        printf key;
        for (col = 1; col <= scount; col++) {
            printf "," average[key, sserv[col]] "," maximum[key, sserv[col]];
        }
        print "";
    }
}

答案 1 :(得分:0)

awksort

awk -F, -v OFS=, '{
    a[$2]=(a[$2]?a[$2]","$4:$4)
}
END{
    for ( i in a ) print i,a[i]
}' File | sort

如果$40个值:

awk -F, -v OFS=, '!a[$2]{a[$2]=$2} {a[$2]=a[$2]","$4} END{for ( i in a ) print a[i]}' | sort

!a[$2]{a[$2]=$2}:如果带有a且索引为$2的数组(以分钟为单位的时间)未退出,则数组a的索引为$2(创建了值为$2的“分钟时间”。当分钟第一次出现在行中时为真。

{a [$ 2] = a [$ 2]“,”$ 4}:将值$4连接到此数组

END:打印数组a

中的所有值

最后输出这个awk结果进行排序。

答案 2 :(得分:0)

运行awk命令: ./ script.awk文件

>>> import json
>>> f = open('file.json', 'r')
>>> obj = json.load(f)
>>> obj['ipAddress']
u'10.2.1.354'