按年份求和,并插入0缺失的条目

时间:2019-03-22 14:08:07

标签: bash awk

我有一个类似下面的年份月份条目的报告

201703 5
201708 10
201709 20
201710 40
201711 80
201712 100
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201902 10 

我需要按年份汇总年份-月份条目,并在该特定年份的所有月份之后打印。年月可能缺少任何月份的条目。 在那几个月中,应该插入一个虚拟值(0)。

必填输出:

201703 5
201704 0
201705 0
201706 0
201707 0
201708 10
201709 20
201710 40
201711 80
201712 100
2017 255
201801 0
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201809 0
201810 0
201811 0
201812 0
2018 775
201901 0
201902 10
201903 0
2019 10

我可以使用以下命令获取年份摘要。

awk ' { c=substr($1,0,4); if(c!=p) { print p,s ;s=0} s=s+$2 ; p=c ; print } ' ym.dat

但是,如何为缺少的条目插入条目? 同样,最后一个条目不应超过当前(系统时间)年-月。即,对于此特定示例,不应为201904..201905 ..等插入虚拟值。它应以201903停止

5 个答案:

答案 0 :(得分:2)

您可以使用以下awk脚本mmyy.awk

{
   rec[$1] = $2;
   yy=substr($1, 1, 4)
   mm=substr($1, 5, 2) + 0
   ys[yy] += $2
}

NR == 1 {
   fm = mm
   fy = yy
}

END {
   for (y=fy; y<=cy; y++)
      for (m=1; m<=12; m++) {
         # print previous years sums
         if (m == 1 && y-1 in ys)
            print y-1, ys[y-1]

         if (y == fy && m < fm)
            continue;
         else if (y == cy && m > cm)
            break;

         # print year month with values or 0 if entry is missing
         k = sprintf("%d%02d", y, m)
         printf "%d%02d %d\n", y, m, (k in rec ? rec[k] : 0)
      }
      print y-1, ys[y-1]
}

然后将其称为:

awk -v cy=$(date '+%Y') -v cm=$(date '+%m') -f mmyy.awk file

201703 5
201704 0
201705 0
201706 0
201707 0
201708 10
201709 20
201710 40
201711 80
201712 100
2017 255
201801 0
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201809 0
201810 0
201811 0
201812 0
2018 775
201901 0
201902 10
201903 0
2019 10

答案 1 :(得分:2)

使用GNU awk for strftime():

$ cat tst.awk
NR==1 {
    begDate = $1
    endDate = strftime("%Y%m")
}
{
    val[$1] = $NF
    year = substr($1,1,4)
}
year != prevYear { prt(); prevYear=year }
END { prt() }

function prt(   mth, sum, date) {
    if (prevYear != "") {
        for (mth=1; mth<=12; mth++) {
            date = sprintf("%04d%02d", prevYear, mth)
            if ( (date >= begDate) && (date <=endDate) ) {
                print date, val[date]+0
                sum += val[date]
                delete val[date]
            }
        }
        print prevYear, sum+0
    }
}

$ awk -f  tst.awk file
201703 5
201704 0
201705 0
201706 0
201707 0
201708 10
201709 20
201710 40
201711 80
201712 100
2017 255
201801 0
201802 0
201803 25
201804 50
201805 50
201806 150
201807 300
201808 200
201809 0
201810 0
201811 0
201812 0
2018 775
201901 0
201902 10
201903 0
2019 10

对于其他问题,您只需使用awk -v endDate=$(date +'%Y%m') '...'

传递endDate

答案 2 :(得分:1)

抢救Perl!

perl -lane '$start ||= $F[0];
            $Y{substr $F[0], 0, 4} += $F[1];
            $YM{$F[0]} = $F[1];
            END { for $y (sort keys %Y) {
                      for $m (1 .. 12) {
                          $m = sprintf "%02d", $m;
                          next if "$y$m" lt $start;
                          print "$y$m ", $YM{$y . $m} || 0;
                          last if $y == 1900 + (localtime)[5]
                               && (localtime)[4] < $m;
                      }
                      print "$y ", $Y{$y} || 0;
                  }
              }' -- file
  • -n逐行读取输入
  • -l从输入中删除换行符并将其添加到输出中
  • -a将空格上的每一行拆分为@F数组

  • substr从YYYYMM日期中提取年份。哈希%Y和%YM使用日期和键以及计数作为值。这就是为什么Year hash使用+=来将值添加到已经累积的值中的原因。

  • 输入已用尽后,将评估END块。
  • 它只是对散列中存储的年份进行迭代,范围1 .. 12用于月份插入零(||运算符将其打印出来)。
  • next$start跳过报告开始前的几个月。
  • last负责跳过本年度的剩余时间。

答案 3 :(得分:1)

以下awk脚本将完成您期望的工作。这个想法是:

  • 将数据存储在数组中
  • 仅当年份更改时打印并加和

这给出了:

# function that prints the year starting
# at month m1 and ending at m2
function print_year(m1,m2,   s,str) {
    s=0
    for(i=(m1+0); i<=(m2+0); ++i) { 
       str=y sprintf("%0.2d",i);
       print str, a[str]+0; s+=a[str]
    }
    print y,s
}

# This works for GNU awk, replace for posix with a call as
# awk -v stime=$(date "+%Y%m") -f script.awk file
BEGIN{ stime=strftime("%Y%m") }
# initializer on first record    
(NR==1){ y=substr($1,1,4); m1=substr($1,5) }
# print intermediate year
(substr($1,1,4) != y) { 
    print_year(m1,12)
    y=substr($1,1,4); m1="01";
    delete a
}
# set array value and keep track of last month
{a[$1]=$2; m2=substr($1,5)}
# check if entry is still valid (past stime or not)
($1 > stime) { exit }
# print all missing years full
# print last year upto system time month
END { 
  for (;y<substr(stime,1,4)+0;y++) { print_year(m1,12); m1=1; m2=12; }
  print_year(m1,substr(stime,5))
}

答案 4 :(得分:1)

好的问题,顺便说一句。星期五下午脑筋急转弯。该回家了。

awk。可选的endtime及其值作为参数引入:

$ awk -v arg1=201904 -v arg2=100 '          # optional parameters
function foo(ym,v) {
    while(p<ym){
        y=substr(p,1,4)                     # get year from previous round
        m=substr(p,5,2)+0                   # get month
        p=y+(m==12) sprintf("%02d",m%12+1)  # December magic
        if(m==12)
            print y,s[y]                    # print the sums (delete maybe?)
        print p, (p==ym?v:0)                # print yyyymm and 0/$2
    }
}
{
    s[substr($1,1,4)]+=$2                   # sums in array, year index
}
NR==1 {                                     # handle first record
    print
    p=$1
}
NR>1 {
    foo($1,$2)
}
END {
    if(arg1)
        foo(arg1,arg2)
    print y=substr($1,1,4),s[y]+arg2
}' file

输出的尾巴:

2018 775
201901 0
201902 10
201903 0
201904 100
2019 110