我正在努力使用awk重新格式化逗号分隔文件。该文件包含多个服务器和多个指标的一天的分钟数据 例如,每服务器2条记录,每分钟24小时
示例输入文件:
server01,00:01:00,AckDelayAverage,9999
server01,00:01:00,AckDelayMax,8888
server01,00:02:00,AckDelayAverage,666
server01,00:02:00,AckDelayMax,5555
.....
server01,23:58:00,AckDelayAverage,4545
server01,23:58:00,AckDelayMax,8777
server01,23:59:00,AckDelayAverage,4686
server01,23:59:00,AckDelayMax,7820
server02,00:01:00,AckDelayAverage,1231
server02,00:01:00,AckDelayMax,4185
server02,00:02:00,AckDelayAverage,1843
server02,00:02:00,AckDelayMax,9982
.....
server02,23:58:00,AckDelayAverage,1022
server02,23:58:00,AckDelayMax,1772
server02,23:59:00,AckDelayAverage,1813
server02,23:59:00,AckDelayMax,9891
我尝试将文件重新格式化为每分钟有一行,并且字段1和1的唯一串联3作为列标题
例如,预期的输出文件如下所示:
Minute, server01-AckDelayAverage,server01-AckDelayMax, server02-AckDelayAverage,server02-AckDelayMax
00:01:00,9999,8888,1231,4185
00:02:00,666,5555,1843,8892
...
...
23:58:00,4545,8777,1022,1772
23:59:00,4686,7820,1813,9891
答案 0 :(得分:0)
使用GNU awk
的解决方案。将其称为awk -F, -f script input_file
:
/Average/ { average[$2, $1] = $4; }
/Max/ { maximum[$2, $1] = $4; }
{
if (!($2 in minutes)) {
minutes[$2] = 1;
}
if (!($1 in servers)) {
servers[$1] = 1;
}
}
END {
mcount = asorti(minutes, smin);
scount = asorti(servers, sserv);
printf "minutes";
for (col = 1; col <= scount; col++) {
printf "," sserv[col] "-average," sserv[col] "-maximum";
}
print "";
for (row = 1; row <= mcount; row++) {
key = smin[row];
printf key;
for (col = 1; col <= scount; col++) {
printf "," average[key, sserv[col]] "," maximum[key, sserv[col]];
}
print "";
}
}
答案 1 :(得分:0)
awk
和sort
:
awk -F, -v OFS=, '{
a[$2]=(a[$2]?a[$2]","$4:$4)
}
END{
for ( i in a ) print i,a[i]
}' File | sort
如果$4
有0
个值:
awk -F, -v OFS=, '!a[$2]{a[$2]=$2} {a[$2]=a[$2]","$4} END{for ( i in a ) print a[i]}' | sort
!a[$2]{a[$2]=$2}
:如果带有a
且索引为$2
的数组(以分钟为单位的时间)未退出,则数组a
的索引为$2
(创建了值为$2
的“分钟时间”。当分钟第一次出现在行中时为真。
{a [$ 2] = a [$ 2]“,”$ 4}:将值$4
连接到此数组
END
:打印数组a
最后输出这个awk结果进行排序。
答案 2 :(得分:0)
运行awk命令: ./ script.awk文件
>>> import json
>>> f = open('file.json', 'r')
>>> obj = json.load(f)
>>> obj['ipAddress']
u'10.2.1.354'