计算每周移动/回滚(7天)总和:

时间:2015-05-28 11:31:38

标签: unix awk

请根据Moving/Rolling back Weekly Sum of Amount($4)帮助计算Distributor wise ($2) and Rolling Date wise

想要像

那样设置可变性
RollingStartDate ==01/05/2015 and RollingInterval==7 and  RollingEndDate ==08/05/2015

例如:

1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015

Input.csv

Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb

示例:2015年5月8日滚动7天数据集将于2015年5月8日至2015年5月2日

aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb

2015年5月8日输出滚动7天数据集

RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350

我可以从这个命令获得上述输出:

awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'

请建议如何推导每周分割数据集然后总结。

期望的输出:

RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350

修改#1

1

逻辑是找到一笔金额,在7天的范围内向经销商收费,即如果我需要计算5月1日的金额,那么我需要考虑5月1日,4月30日,29日的订单项4月28日,4月28日,4月26日和4月25日,相当于1st May (-) minus 6 days back ...明智的5月2日滚动日期等于5月2日至5月26日(2nd May minus 6 days back ..)

2

日期格式为DD/MM/YYYY - 2015年5月2日是5月2日

  1. 由于该文件包含2到3个月的deatils,不想从文件中选择第一个日期(25/04/2015)然后进行减去6天的反向分析,因此"RollingStartDate"将有助于从哪些日期考虑数据,"RollingInterval"将有助于分析“7天”后退或“14天”后退或“每月30天”回溯分析。 "RollingEndDate"将有助于避免实际文件包含任何可用的未来日期数据,在这种情况下,如果第9天或第15天可能需要排除日期行项目......

1 个答案:

答案 0 :(得分:6)

这是一个解决方案,它只排除了前7天没有的日期,而不需要特定的开始/停止范围:

$ cat tst.awk        
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
    endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
    if (begSecs=="") {
        begSecs = endSecs + ((window-1) * secsPerDay)
    }
    amount[endSecs][$3] += $4
    dists[$3]
}
END {
    for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
        for (dayNr=1; dayNr<=window; dayNr++) {
            rollSecs = currSecs - ((dayNr-1) * secsPerDay)
            for (dist in dists) {
                sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
            }
        }
        for (dist in dists) {
            print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
            delete sum[dist]
        }
    }
}

$ awk -f tst.awk file
RollingDate,Distributor,Amount          
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145

要使用一些不同于7天的窗口大小,只需在命令行中设置它:

$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120

上面使用GNU awk来表示真正的2D数组和时间函数。希望很清楚,您可以进行任何修改,以包含/排除特定的日期范围。