awk - ascii文件块中的移动平均值

时间:2014-12-07 15:36:52

标签: awk selected moving-average

我有一个大的ascii文件,看起来像这样:

12,3,0.12,965.814
11,3,0.22,4313.2
14,3,0.42,7586.22
17,4,0,0
11,4,0,0
15,4,0,0
13,4,0,0
17,4,0,0
11,4,0,0
18,3,0.12,2764.86
12,3,0.22,2058.3
11,3,0.42,2929.62
10,4,0,0
10,4,0,0
14,4,0,0
12,4,0,0
19,3,0.12,1920.64
20,3,0.22,1721.51
12,3,0.42,1841.55
11,4,0,0
15,4,0,0
19,4,0,0
11,4,0,0
13,4,0,0
17,3,0.12,2738.99
12,3,0.22,1719.3
18,3,0.42,3757.72
.
.
.

我想用awk计算三个值的选定移动平均值。选择应该由第二和第三列完成。 应仅计算第二列为3的行的移动平均值。 我想计算第三列选择的三个移动平均线(每个“块”包含相同顺序的相同值)。 然后计算第四列的移动平均值。 我想输出第二个移动平均值的整行,并用结果替换第四列。 我知道这听起来很复杂,所以我将举例说明我想要计算的内容以及所需的结果:

(965.814+2764.86+1920.64)/3 = 1883.77

并将结果与​​第10行一起输出:

18,3,0.12,1883.77

然后继续第二,第十一和第十八行......

我的数据示例的最终结果应如下所示:

18,3,0.12,1883.77
12,3,0.22,2697.67
11,3,0.42,4119.13
19,3,0.12,2474.83
20,3,0.22,1833.04
12,3,0.42,2842.96

我尝试用awk中的以下代码计算移动平均值,但我认为我设计的脚本错误,因为awk告诉我每个“$ 2 == 3”的语法错误。

BEGIN { FS="," ; OFS = "," }
    $2 == 3 {
        a; b; c; d; e; f = 0        
        line1 = $0; a = $3; b = $4; getline
        line2 = $0; c = $3; d = $4; getline
        line3 = $0; e = $3; f = $4
            $2 == 3 {
                line11 = $0; a = $3; b += $4; getline
                line22 = $0; c = $3; d += $4; getline
                line33 = $0; e = $3; f += $4
                    $2 == 3 {
                        line111 = $0; a = $3; b += $4; getline
                        line222 = $0; c = $3; d += $4; getline
                        line333 = $0; e = $3; f += $4
                    }
            }

        $0 = line11; $3 = a; $4 = b/3; print
        $0 = line22; $3 = c; $4 = d/3; print
        $0 = line33; $3 = e; $4 = f/3
    }
    {print}

你能帮助我理解如何纠正我的脚本(我认为我对awk的哲学有缺点)或者启动一个完整的新脚本,因为那里有一个更简单的解决方案; - )

我还尝试了另一个想法:

BEGIN { FS="," ; OFS = "," }
    i=0;
    do {
        i++;
        a; b; c; d; e; f = 0
        $2 == 3 {
        line1 = $0; a = $3; b += $4; getline
        line2 = $0; c = $3; d += $4; getline
        line3 = $0; e = $3; f += $4
    }while(i<3)

        $0 = line1; $3 = a; $4 = b/3; print
        $0 = line2; $3 = c; $4 = d/3; print
        $0 = line3; $3 = e; $4 = f/3
    }
    {print}

这个也不起作用,awk给我两个语法错误(一个在“do”,另一个在“$$ 2 == 3”之后)。

我在两个脚本中都进行了更改并尝试了很多,并且在某些时候它们运行没有错误,但它们根本没有提供所需的输出,所以我认为必须有一个普遍的问题。

我希望你能帮助我,那真的很棒!

1 个答案:

答案 0 :(得分:2)

规范化输入

如果使用正确的工具规范化输入,那么找到解决方案的任务就容易多了

我的想法是使用awk选择$2==3的记录,然后使用sort将数据分组到第三列的数值

% echo '12,3,0.12,965.814
11,3,0.22,4313.2
14,3,0.42,7586.22
17,4,0,0
11,4,0,0
15,4,0,0
13,4,0,0
17,4,0,0
11,4,0,0
18,3,0.12,2764.86
12,3,0.22,2058.3
11,3,0.42,2929.62
10,4,0,0
10,4,0,0
14,4,0,0
12,4,0,0
19,3,0.12,1920.64
20,3,0.22,1721.51
12,3,0.42,1841.55
11,4,0,0
15,4,0,0
19,4,0,0
11,4,0,0
13,4,0,0
17,3,0.12,2738.99
12,3,0.22,1719.3
18,3,0.42,3757.72' | \
awk -F, '$2==3' | \
sort --field-separator=, --key=3,3 --numeric-sort --stable
12,3,0.12,965.814
18,3,0.12,2764.86
19,3,0.12,1920.64
17,3,0.12,2738.99
11,3,0.22,4313.2
12,3,0.22,2058.3
20,3,0.22,1721.51
12,3,0.22,1719.3
14,3,0.42,7586.22
11,3,0.42,2929.62
12,3,0.42,1841.55
18,3,0.42,3757.72
% 

归一化输入的原因

正如您所看到的,现在情况更加清晰,我们可以尝试设计一种算法来输出3个元素的运行平均值。

% awk -F, '$2==3' YOUR_FILE | \
sort --field-separator=, --key=3,3 --numeric-sort --stable | \
awk -F, '                                      
    $3!=prev {prev=$3
              c=0
              s[1]=0;s[2]=0;s[3]=0}
             {old=new
              new=$0
              c = c+1; i = (c-1)%3+1; s[i] = $4
              if(c>2)print old FS (s[1]+s[2]+s[3])/3}'
18,3,0.12,2764.86,1883.77
19,3,0.12,1920.64,2474.83
12,3,0.22,2058.3,2697.67
20,3,0.22,1721.51,1833.04
11,3,0.42,2929.62,4119.13
12,3,0.42,1841.55,2842.96

糟糕,

我忘记了你对替代$4的要求,我会提出一个解决方案,除非你比我快......

编辑:更改行

             {old=new

             {split(new,old,",")

并更改行

              if(c>2)print old FS (s[1]+s[2]+s[3])/3}'

              if(c>2) print old[1] FS old[2] FS old[3] FS (s[1]+s[2]+s[3])/3}'