以不规则的间隔计算平均值而不考虑shell脚本中的缺失值?

时间:2016-07-22 01:51:53

标签: linux shell awk

我有一个包含许多缺失值的数据集为-999。部分数据是

input.txt
30
-999
10
40
23
44
-999
-999
31
-999
54
-999 
-999
-999
-999
-999
-999
10
23
2
5
3
8
8
7
9
6
10
and so on

我想计算每个5,6,6行间隔的平均值而不考虑缺失值。

欲望输出

ofile.txt
25.75   (i.e. consider first 5 rows and take average without considering missing values, so (30+10+40+23)/4)
43      (i.e. consider next 6 rows and take average without considering missing values, so (44+31+54)/3)
-999    (i.e. consider next 6 and take average without considering missing values. Since all are missing, so write as a missing value -999)
8.6     (i.e. consider next 5 rows and take average (10+23+2+5+3)/5)
8     (i.e. consider next 6 rows and take average)

如果是这个

的常规间隔(比方说5),我可以这样做
awk '!/\-999/{sum += $1; count++} NR%5==0{print count ? (sum/count) :-999;sum=count=0}' input.txt

我在这里定期询问了一个类似的问题Calculating average without considering missing values in shell script?但是在这里我要求解决方案不规则的间隔。

2 个答案:

答案 0 :(得分:2)

使用 AWK

awk -v f="5" 'f&&f--&&$0!=-999{c++;v+=$0} NR%17==0{f=5;r++} 
!f&&NR%17!=0{f=6;r++} r&&!c{print -999;r=0} r&&c{print v/c;r=v=c=0}
END{if(c!=0)print v/c}' input.txt

<强>输出

25.75
43
-999
8.6
8

<强>击穿

f&&f--&&$0!=-999{c++;v+=$0} #add valid values and increment count
NR%17==0{f=5;r++} #reset to 5,6,6 pattern 
!f&&NR%17!=0{f=6;r++} #set 6 if pattern doesnt match
r&&!c{print -999;r=0} #print -999 if no valid values
r&&c{print v/c;r=v=c=0} #print avg
END{
 if(c!=0) #print remaining values avg
  print v/c
}

答案 1 :(得分:2)

$ cat tst.awk
function nextInterval(  intervals) {
    numIntervals = split("5 6 6",intervals)
    intervalsIdx = (intervalsIdx % numIntervals) + 1
    return intervals[intervalsIdx]
}

BEGIN {
    interval = nextInterval()
    noVal = -999
}

$0 != noVal {
    sum += $0
    cnt++
}

++numRows == interval {
    print (cnt ? sum / cnt : noVal)
    interval = nextInterval()
    numRows = sum = cnt = 0
}

$ awk -f tst.awk file
25.75
43
-999
8.6
8