请根据Moving/Rolling back Weekly Sum of Amount($4)
帮助计算Distributor wise ($2) and Rolling Date wise
。
想要像
那样设置可变性RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
例如:
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
示例:2015年5月8日滚动7天数据集将于2015年5月8日至2015年5月2日
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
2015年5月8日输出滚动7天数据集
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
我可以从这个命令获得上述输出:
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
请建议如何推导每周分割数据集然后总结。
期望的输出:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
修改#1
1
逻辑是找到一笔金额,在7天的范围内向经销商收费,即如果我需要计算5月1日的金额,那么我需要考虑5月1日,4月30日,29日的订单项4月28日,4月28日,4月26日和4月25日,相当于1st May (-) minus 6 days back
...明智的5月2日滚动日期等于5月2日至5月26日(2nd May minus 6 days back
..)
2
日期格式为DD/MM/YYYY
- 2015年5月2日是5月2日
"RollingStartDate"
将有助于从哪些日期考虑数据,"RollingInterval"
将有助于分析“7天”后退或“14天”后退或“每月30天”回溯分析。
"RollingEndDate"
将有助于避免实际文件包含任何可用的未来日期数据,在这种情况下,如果第9天或第15天可能需要排除日期行项目...... 答案 0 :(得分:6)
这是一个解决方案,它只排除了前7天没有的日期,而不需要特定的开始/停止范围:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
要使用一些不同于7天的窗口大小,只需在命令行中设置它:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
上面使用GNU awk来表示真正的2D数组和时间函数。希望很清楚,您可以进行任何修改,以包含/排除特定的日期范围。