我想检查第一列中的两行是否以相同的数字开头,如果发生这种情况,则应显示第二列的平均值。文件示例:
01 21 6 10% 93.3333%
01 22 50 83.3333% 93.3333%
02 20.5 23 18.1102% 96.8504%
02 21.5 100 78.7402% 96.8504%
03 22.2 0 0% 100%
03 21.2 29 100% 100%
04 22.5 1 5.55556% 100%
04 23.5 17 94.4444% 100%
05 22.7 9 7.82609% 100%
05 21.7 106 92.1739% 100%
06 23 11 17.4603% 96.8254%
06 22 50 79.3651% 96.8254%
07 20.5 14 18.6667% 96%
07 21.5 58 77.3333% 96%
08 21.8 4 100% 100%
09 22.6 0 0% 100%
09 21.6 22 100% 100%
例如,两个第一行以01
开头,但只有一行以08
开头(第15行)。因此,基于这两种情况的输出应该是:
01 21.5
...
...
...
08 21.8
...
...
...
我最终得到了以下awk行,当文件总是有两条相似的行时效果很好,但它使用上面显示的文件失败了(因为第15行):
awk '{sum+=$2} (NR%2)==0{print sum/2; sum=0;}'
欢迎任何提示,
答案 0 :(得分:4)
使用GNU awk
gawk '
{sum[$1]+=$2; n[$1]++}
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in sum) print key, sum[key]/n[key]
}
' file
01 21.5
02 21
03 21.7
04 23
05 22.2
06 22.5
07 21
08 21.8
09 22.1
“PROCINFO”行使数组遍历以数字方式对我的索引进行排序。否则输出会出现随机。
答案 1 :(得分:4)
这个awk应该可以工作:
awk 'function dump(){if (n>0) printf "%s%s%.2f\n", p, OFS, sum/n}
NR>1 && $1 != p{dump(); sum=n=0} {p=$1; sum+=$2; n++} END{dump()}' file
01 21.5
02 21.0
03 21.7
04 23.0
05 22.2
06 22.5
07 21.0
08 21.8
09 22.1
说明:我们使用3个变量:
p -> to hold previous row's $1 value
n -> count of similar $1 values
sum -> is sum of $2 values for similar $1 rows
工作原理:
NR>1 && $1 != p # when row #1 > 1 and prev $1 is not current $1
dump() # function is to print formatted value of $1 and average
p=$1; sum+=$2; n++ # sets p to $1, adds current $2 to sum and increments n
答案 2 :(得分:1)
带有管道排序的awk
awk '{s[$1]+=$2;c[$1]++} END{for(i in s) print i, s[i]/c[i]}' file | sort
答案 3 :(得分:1)
awk '
second{
if($1 == first){
print (second + $2) / 2
second = 0
next
}
else
print second
}
{
printf "%s ", $1
fist = $1
second = $2
}
END{
if(second)
print second
}' file