如何计算时间序列的异常?

时间:2017-03-18 02:31:58

标签: shell awk

我有一系列时间温度数据:

ifile.txt
1921  25
1922  25.1
1923  24.2
1924  23.4
1925  24.4
1926  25.1
1927  23.6
1928  25.2
1929  23.9
1930  25.6

我想计算1923年至1929年期间的异常现象。

我的算法是:

1923  24.2 - (average of the temperatures during 1923-1929)
1924  23.4 - (average of the temperatures during 1923-1929)
1925  24.4 - (average of the temperatures during 1923-1929)
1926  25.1 - (average of the temperatures during 1923-1929)
1927  23.6 - (average of the temperatures during 1923-1929)
1928  25.2 - (average of the temperatures during 1923-1929)
1929  23.9 - (average of the temperatures during 1923-1929)

我的脚本是

mean=$(awk '{if ($1 >= 1923 && $1 <= 1929) sum += $2; count++} END {print count ? (sum/count) : count;sum=count=0}' ifile.txt)
awk '{if ($1 >= 1923 && $1 <= 1929) printf "%4i %5.2f\n", $1, $2-'$mean'}' ifile.txt > ofile.txt

没有打印正确的值。你能查一下我的剧本吗?

4 个答案:

答案 0 :(得分:1)

另一种方法,假设日期已分类

awk '/1923/,/1929/ {y[++c]=$1; t[c]=$2; sum+=$2} 
     END           {avg=sum/c; 
                    for(k=1;k<=c;k++) print y[k],t[k]-avg}' file

1923 -0.0571429
1924 -0.857143
1925 0.142857
1926 0.842857
1927 -0.657143
1928 0.942857
1929 -0.357143

您可以修复打印格式。

然而,通过双扫描可以进一步简化

$ awk '/1923/,/1929/{if (NR==FNR) {sum+=$2; c++; avg=sum/c} 
                     else print $1,$2-avg}' file{,}

答案 1 :(得分:1)

@Kay:@try:虽然Karakfa的解决方案很好。该解决方案可以作为替代方案,并且不包含任何数组。

awk 'FNR==NR{f=1;if($1 >= 1923 && $1 <= 1929){count++;SUM+=$2;};next} FNR==1 && f==1{AVG=SUM/count;next} ($1 >= 1923 && $1 <= 1929){print $1, $2-AVG}'  Input_file  Input_file

EDIT1:现在添加非单线形式的解决方案。

awk 'FNR==NR{
                f=1;
                if($1 >= 1923 && $1 <= 1929){
                                                count++;
                                                SUM+=$2;
                                            };
                next
            }
     FNR==1 && f==1{
                AVG=SUM/count;
                next
                   }
     ($1 >= 1923 && $1 <= 1929){
                print $1, $2-AVG
            }
    '  Input_file  Input_file

EDIT2:现在也为解决方案添加解释。以下是出于解释目的,您只能运行上面的代码。

awk 'FNR==NR{                                               ## Checking condition FNR==NR, which will be only TRUE when first time Input_file is being read. FNR and NR both tells us number of lines of Input_file oinly difference is FNR's value will be RESET whenever a next Input_file is veing read and NR's value will be increasing till all Input_files are read.
                f=1;                                        ## making a variable named f's value to 1.
                if($1 >= 1923 && $1 <= 1929){               ## Checking condition if $1(first field's) value is graeter than 1923 and lesser than 1929, then do following operations.
                                                count++;    ## make a variable named count and increment it each time it satisfy the above condition.
                                                SUM+=$2;    ## creating a variable named SUM whose value will be SUM of $2's value and it will add into previous value to get the SUM of all $2's value of all matching lines.
                                            };
                next                                        ## next is built-in keyword which will skip the next statements.
            }
     FNR==1 && f==1{                                        ## Checking conditions if FNR==1 and f==1, which will be TRUE when first Input_file is read and before 1st line of Input_file is being read.
                AVG=SUM/count;                              ## creating a variable named AVG which will have average by dividing the variable SUM and variable named count.
                next                                        ## using next statement to skip all further statements and save a cycle of cpu may be.
                   }
     ($1 >= 1923 && $1 <= 1929){                            ## Checking condition if $1's value is greater than 1923 and lesser or equal to 1929n then perform following actions.
                print $1, $2-AVG                            ## print the value of $1 and then $2-AVG(as per your request).
            }
    ' Input_file  Input_file                                ## Mentioning the Inpur_file 2 times here.

答案 2 :(得分:1)

你可以通过读取相同的文件两次来实现这一点,第一个读数是计算平均值,第二个读数是计算异常,实际读取两次相同的文件可能很慢,但实际上零内存开销,你不会得到像错误信息out of memory因为我们在这里没有使用数组。

单行:

awk -v s="1923" -v e="1929" '{f=$1>=s && $1<=e}f && NR==FNR{sum+=$2; c++; next}f{ print $0, $2-(sum/c) }' file file

说明:

awk -v s="1923" -v e="1929" '             # call awk set var s and e
                                          # where s is starting year
                                          # e is ending year
            { 
                f=$1>=s && $1<=e          # f holds boolean status whether data is within a range
            }

f && NR==FNR{                             # if data is within a range
                                          # and we are reading file first time (FNR==NR is true only when awk reads first file), then

               sum+=$2;                   # sum column2 value
               c++;                       # increment counter
               next                       # stop processing go to next line (skipping any code below this line)
            }
                                          # Here we read same file second time
           f{                             # again are we within a range ( f holds boolean status true or false, if true then )
                print $0, $2-(sum/c)      # print current record/line/row, 2nd field minus average
            }' file file 

输入:

$ cat file
1921  25
1922  25.1
1923  24.2
1924  23.4
1925  24.4
1926  25.1
1927  23.6
1928  25.2
1929  23.9
1930  25.6

输出

$ awk -v s="1923" -v e="1929" '{f=$1>=s && $1<=e}f && NR==FNR{sum+=$2; c++; next}f{ print $0, $2-(sum/c) }' file file
1923  24.2 -0.0571429
1924  23.4 -0.857143
1925  24.4 0.142857
1926  25.1 0.842857
1927  23.6 -0.657143
1928  25.2 0.942857
1929  23.9 -0.357143

答案 3 :(得分:1)

还有另一种选择:

public function store($location){   
if($this->zip->open($location, file_exists($location) ? ZIPARCHIVE::OVERWRITE : ZIPARCHIVE::CREATE)){
foreach($this->files as $file){
$this->count++;
$this->image_name ="OrderImg".$this->count.".png";
$this->set = str_replace('data:image/png;base64,', '', $file);
$this->set = str_replace(' ', '+', $file);
$this->zip->addFile($this->image_name, base64_decode($file));
}
$this->zip->close();
}
}

for循环不保证当前调用的顺序,但如果需要,您只需将其扩展为传统的for循环。