我是awk的新手,我有一个数据文件从1958年1月到2014年12月,但是有些数据丢失,例如这里是文件的一部分缺少几个月:
2009 6 0.273
2009 7 0.000
2009 10 4.07
2009 11 8.25
我必须用(NA)添加缺失的行,所以输出应该是这样的:
2009 6 0.273
2009 7 0.000
2009 8 NA
2009 9 NA
2009 10 4.07
2009 11 8.25
我编写了这段代码,它似乎有用,但有些东西丢失了:
awk 'BEGIN{OFS="\t";y=1958;m=1}{t=mktime(y" "m" 01 0 0 0");y=strftime("%Y",t);m=strftime("%m",t)*1;\
if(y==$1 && m==$2){
print $0;
}else{
print y,m,"NA";
}
m++
}' filename
结果如下:
2009 6 0.273
2009 7 0.000
2009 8 NA
2009 9 NA
2009 10 NA
2009 11 NA
我认为解决方案是在打印NA后保持同一行,但我无法弄明白。
提前致谢。
答案 0 :(得分:2)
我将年/月组合映射到连续的整数序列,而不是使用mktime
;这样可以更容易地迭代它们。它看起来像这样:
awk -F '\t' '
# provide functions to map year/month combinations to a contiguous
# sequence of integers, and to reverse the transformation.
function combine(y, m) { return y * 12 + (m - 1); }
function month(c) { return c % 12 + 1; }
function year(c) { return (c - month(c) + 1) / 12; }
# In the beginning: Ensure input is split the same way as the output, and
# prime the pump as though there had been a last line describing Dec. 1957
# (so that Jan. 1958 comes next)
BEGIN {
OFS = FS
last = combine(1957, 12)
}
# processing data:
{
# map to sequence
this = combine($1, $2);
# insert missing lines
for(i = last + 1; i < this; ++i) {
print year(i), month(i), "NA"
}
# start from here next time
last = this
}
# then print input lines unchanged
1' filename
答案 1 :(得分:1)
执行此操作的简单方法是使用输入文件中的值填充数组,然后遍历所有年/月并打印数组值(如果已填充),否则为NA:
$ cat tst.awk
{val[$1,$2] = $3}
END {for (y=1958;y<=2014;y++) for (m=1;m<=12;m++) print y,m,((y,m) in val ? val[y,m] : "NA")}
$ awk -f tst.awk file | grep 2009
2009 1 NA
2009 2 NA
2009 3 NA
2009 4 NA
2009 5 NA
2009 6 0.273
2009 7 0.000
2009 8 NA
2009 9 NA
2009 10 4.07
2009 11 8.25
2009 12 NA
答案 2 :(得分:0)
试试这个。希望它也会有所帮助
NOTE:IT will take the year and month from the first line of the file
awk -F" " '
NR==1{lastyear=$1;lastmonth=$2}
lastyear<($1-1) { while (lastyear<$1)
{print "falta año"lastyear+1; lastyear++}
}
lastmonth<($2-1){ while( lastmonth<($2-1) ||
( lastmonth<12 && lastmonth!=($2-1)))
{print $1 FS lastmonth+1 FS "NA";lastmonth++1}
}
lastmonth=($2-1) && lastyear=($1-1){ print $0}
{lastyear=$1; lastmonth=$2}
END{
while(lastyear<2011){if(lastmonth==12){lastyear=lastyear+1; lastmonth=0}
while( lastmonth<12 ){print lastyear FS lastmonth+1 FS "NA";lastmonth++1}
}
}
' file.dat
2009 6 0.273
2009 7 0.000
2009 8 NA
2009 9 NA
2009 10 4.07
2009 11 8.25
2009 12 NA
2010 1 NA
2010 2 NA
2010 3 NA
2010 4 NA
2010 5 NA
2010 6 NA
2010 7 NA
2010 8 NA
2010 9 NA
2010 10 NA
2010 11 NA
2010 12 NA
2011 1 NA
2011 2 NA
2011 3 NA
2011 4 NA
2011 5 NA
2011 6 NA
2011 7 NA
2011 8 NA
2011 9 NA
2011 10 NA
2011 11 NA
2011 12 NA