使用awk移动平均线

时间:2014-01-06 12:06:27

标签: awk moving-average

我想在一列中平均每100个点,然后将平均值放在中间 - 在50点。

我尝试使用此脚本计算移动平均线:

 BEGIN {
          P = 100; 
      }

      { 
          x = $2;   
          i = NR % P; 
          MA += (x - Z[i]) / P; 
          Z[i] = x; 
          print $1,"\t",$2,"\t",MA; 
      }

但我需要知道如何将它放在中间?

输入样本:

Depth          Velocity
1150.315    434.929
1150.468    434.929
1150.62         434.929
1150.772    434.929
1150.925    434.929
1151.077    434.929
1151.23         434.929
1151.382    434.929
1151.534    434.929
1151.687    434.929
1151.839    434.929
1151.992    434.929
1152.144    434.929
1152.296    434.929
1152.449    434.929
1152.601    434.929
1152.754    434.929
1152.906    434.929
1153.058    434.929
1153.211    434.929
1153.363    434.929
1153.516    434.929
1153.668    434.929
1153.82     434.929
1153.973    434.929
1154.125    434.929
1154.278    434.929
1154.43     434.929
1154.582    434.929
1154.735    434.929
1154.887    434.929
1155.04         434.929
1155.192    434.929
1155.344    434.929
1155.497    434.929
1155.649    434.517
1155.802    434.105
1155.954    433.693
1156.106    433.233
1156.259    432.773
1156.411    432.313
1156.564    431.853
1156.716    431.853
1156.868    431.853
1157.021    431.853
1157.173    431.853
1157.326    431.853
1157.478    431.853
1157.63     431.853
1157.783    431.853
1157.935    431.853
1158.088    431.853
1158.24     431.853
1158.392    431.853
1158.545    431.853
1158.697    431.853
1158.85     431.853
1159.002    431.853
1159.154    432.642
1159.307    433.431
1159.459    434.221
1159.612    437.791
1159.764    441.363
1159.916    444.933
1160.069    448.505
1160.221    448.037
1160.374    447.569
1160.526    447.101
1160.678    455.151
1160.831    463.208
1160.983    471.259
1161.136    473.544
1161.288    475.826
1161.44     478.111
1161.593    465.778
1161.745    453.435
......           .......

输出应该是深度和平滑速度 - 平均 - 并且将删除第一个和最后50个点。

3 个答案:

答案 0 :(得分:3)

您可以尝试以下脚本:

awk -vn=100 -f a.awk file

其中a.awk

BEGIN {
    m=int((n+1)/2)
}
{L[NR]=$2; sum+=$2}
NR>=m {d[++i]=$1}
NR>n {sum-=L[NR-n]}
NR>=n{
    a[++k]=sum/n
}
END {
    for (j=1; j<=k; j++)
        print d[j],a[j]
}

例如给定测试数据:

1  2
2  2
3  3
4  3
5  4
6  14
7  5
8  5
9  6
10 6
11 7
12 7
13 8
14 8
15 9

并且正在运行awk -vn=5 -f a.awk file

3 2.8
4 5.2
5 5.8
6 6.2
7 6.8
8 7.2
9 5.8
10 6.2
11 6.8
12 7.2
13 7.8

答案 1 :(得分:1)

没有输入就很难理解你想要达到的目标,下面是5点移动平均值和3点索引的例子,根据你的需要实现

awk '
 BEGIN{
         OFS = "\t"

         No_of_row = 20

         Average_Point = 5
         index_Point   = 3 

         print "Column1" OFS "Column2"

         for(i=1;i<=No_of_row;i++)
         {
             C1[i]=i
             C2[i]=i
             print i OFS i
         }
         print RS "index" OFS "Average" OFS "Data_used" OFS "Sum" OFS "No_of_Point"
      }
   END{ 
        for(i=1;i<=No_of_row;i++){
                                for(j=1;j<=Average_Point;j++){
                                                              flag = 0
                                                              if(C1[i+j] || j == 1 && C1[i])
                                                                {
                                                                 add = j==1 ? C2[i] : C2[(i+j)-1]
                                                                 sum += add
                                                                 ind  = j==index_Point ? C1[(i+j)-1] : ind
                                                                 flag = 1 
                                                                 s = s ? s "+" add : add
                                                                }
                                                              }

                                if(flag == 1){
                                              print ind,sum/Average_Point,s,sum,Average_Point
                                             }  

                                sum = ind = s = ""      
                                }          
      }
    '   /dev/null

所得

 Column1    Column2         
 1  1           
 2  2           
 3  3           
 4  4           
 5  5           
 6  6           
 7  7           
 8  8           
 9  9           
 10 10          
 11 11          
 12 12          
 13 13          
 14 14          
 15 15          
 16 16          
 17 17          
 18 18          
 19 19          
 20 20          

 index Average  Data_used   Sum No_of_Point
 3    3        1+2+3+4+5        15  5
 4    4        2+3+4+5+6        20  5
 5    5        3+4+5+6+7        25  5
 6    6        4+5+6+7+8        30  5
 7    7        5+6+7+8+9        35  5
 8    8        6+7+8+9+10       40  5
 9    9        7+8+9+10+11      45  5
 10   10       8+9+10+11+12     50  5
 11   11       9+10+11+12+13    55  5
 12   12       10+11+12+13+14   60  5
 13   13       11+12+13+14+15   65  5
 14   14       12+13+14+15+16   70  5
 15   15       13+14+15+16+17   75  5
 16   16       14+15+16+17+18   80  5
 17   17       15+16+17+18+19   85  5

答案 2 :(得分:0)

你必须修改否。为适合您的文件的行,输入应包含两列作为索引,另一列将被平均,输出将少于100点,每侧50个。

# Moving average over the Second column of a data file 
    BEGIN{
             OFS = "\t"

             No_of_row = 10294
             Average_Point = 100
             index_Point   = 50


          }
       { 
            for(i=1;i<=No_of_row;i++){
                  NR == i &&
                  C1[i]= $1 
                  NR == i && 
                  C2[i]= $2 } }
    END {  for(i=1;i<=No_of_row - Average_Point;i++){
                                  for(j=1;j<=Average_Point;j++){
                                                                 if(C1[i+j] || j == 1 && C1[i])
                                                                    {
                                                                     add = j==1 ? C2[i] : C2[(i+j)-1]
                                                                     sum += add
                                                                     ind  = j==index_Point ? C1[(i+j)-1] : ind
                                                                     }
                                                               }
                                         {
                                                 print ind,sum/Average_Point
                                                               }
                                        sum = ind = ""
                                           }  
         }