如何使用AWK在时间序列数据文件中插入缺少的月份(值为NA)

时间:2015-06-06 12:22:06

标签: awk

我是awk的新手,我有一个数据文件从1958年1月到2014年12月,但是有些数据丢失,例如这里是文件的一部分缺少几个月:

2009    6       0.273
2009    7       0.000
2009    10      4.07
2009    11      8.25

我必须用(NA)添加缺失的行,所以输出应该是这样的:

2009    6       0.273
2009    7       0.000
2009    8       NA
2009    9       NA
2009    10      4.07
2009    11      8.25

我编写了这段代码,它似乎有用,但有些东西丢失了:

awk 'BEGIN{OFS="\t";y=1958;m=1}{t=mktime(y" "m" 01 0 0 0");y=strftime("%Y",t);m=strftime("%m",t)*1;\
    if(y==$1 && m==$2){
        print $0;
    }else{
        print y,m,"NA";
    }
    m++
}' filename

结果如下:

2009    6       0.273
2009    7       0.000
2009    8       NA
2009    9       NA
2009    10      NA
2009    11      NA

我认为解决方案是在打印NA后保持同一行,但我无法弄明白。

提前致谢。

3 个答案:

答案 0 :(得分:2)

我将年/月组合映射到连续的整数序列,而不是使用mktime;这样可以更容易地迭代它们。它看起来像这样:

awk -F '\t' '
  # provide functions to map year/month combinations to a contiguous
  # sequence of integers, and to reverse the transformation.
  function combine(y, m) { return y * 12 + (m - 1); }
  function month(c)      { return c % 12 + 1; }
  function year(c)       { return (c - month(c) + 1) / 12; }

  # In the beginning: Ensure input is split the same way as the output, and
  # prime the pump as though there had been a last line describing Dec. 1957
  # (so that Jan. 1958 comes next)
  BEGIN {
    OFS = FS
    last = combine(1957, 12)
  }

  # processing data:
  {
    # map to sequence
    this = combine($1, $2);

    # insert missing lines
    for(i = last + 1; i < this; ++i) {
      print year(i), month(i), "NA"
    }

    # start from here next time
    last = this
  }

  # then print input lines unchanged
  1' filename

答案 1 :(得分:1)

执行此操作的简单方法是使用输入文件中的值填充数组,然后遍历所有年/月并打印数组值(如果已填充),否则为NA:

$ cat tst.awk
{val[$1,$2] = $3}
END {for (y=1958;y<=2014;y++) for (m=1;m<=12;m++) print y,m,((y,m) in val ? val[y,m] : "NA")}

$ awk -f tst.awk file | grep 2009
2009 1 NA
2009 2 NA
2009 3 NA
2009 4 NA
2009 5 NA
2009 6 0.273
2009 7 0.000
2009 8 NA
2009 9 NA
2009 10 4.07
2009 11 8.25
2009 12 NA

答案 2 :(得分:0)

试试这个。希望它也会有所帮助

NOTE:IT will take the year and month from the first line of the file 

awk -F"    " '

NR==1{lastyear=$1;lastmonth=$2}
lastyear<($1-1) {  while (lastyear<$1)
                   {print "falta año"lastyear+1; lastyear++}
                } 
lastmonth<($2-1){ while( lastmonth<($2-1) || 
                         ( lastmonth<12 && lastmonth!=($2-1)))
                   {print $1 FS lastmonth+1 FS  "NA";lastmonth++1}
                 }

lastmonth=($2-1) && lastyear=($1-1){ print $0}

{lastyear=$1; lastmonth=$2}
END{
while(lastyear<2011){if(lastmonth==12){lastyear=lastyear+1; lastmonth=0}
while( lastmonth<12 ){print lastyear FS lastmonth+1 FS  "NA";lastmonth++1}
}
}

' file.dat


2009    6       0.273
2009    7       0.000
2009    8    NA
2009    9    NA
2009    10      4.07
2009    11      8.25
2009    12    NA
2010    1    NA
2010    2    NA
2010    3    NA
2010    4    NA
2010    5    NA
2010    6    NA
2010    7    NA
2010    8    NA
2010    9    NA
2010    10    NA
2010    11    NA
2010    12    NA
2011    1    NA
2011    2    NA
2011    3    NA
2011    4    NA
2011    5    NA
2011    6    NA
2011    7    NA
2011    8    NA
2011    9    NA
2011    10    NA
2011    11    NA
2011    12    NA