我大致有以下data.frame:
df <- structure(list(WK = c("W212015", "W212015", "W222015", "W222015",
"W232015", "W232015", "W372016", "W372016", "W382016", "W382016",
"W392016", "W392016"), YEAR = c(2015, 2015, 2015, 2015, 2015,
2015, 2016, 2016, 2016, 2016, 2016, 2016), SEM = c("W21", "W21",
"W22", "W22", "W23", "W23", "W37", "W37", "W38", "W38", "W39",
"W39"), DEPTO = c("FARMA", "NO FARMA", "FARMA", "NO FARMA", "FARMA",
"NO FARMA", "FARMA", "NO FARMA", "FARMA", "NO FARMA", "FARMA",
"NO FARMA"), UNIDADES = c(1010, 1252, 1354, 1275, 1411, 1171,
1057, 1362, 1435, 1120, 1451, 1140), VENTAS = c(128270, 172776,
173312, 204000, 189074, 165111, 130011, 226092, 231035, 181440,
220552, 197220)), .Names = c("WK", "YEAR", "SEM", "DEPTO", "UNIDADES",
"VENTAS"), class = "data.frame", row.names = c(NA, -12L))
head(df)
WK YEAR SEM DEPTO UNIDADES VENTAS
1 W212015 2015 W21 FARMA 1010 128270
2 W212015 2015 W21 NO FARMA 1252 172776
3 W222015 2015 W22 FARMA 1354 173312
4 W222015 2015 W22 NO FARMA 1275 204000
5 W232015 2015 W23 FARMA 1411 189074
6 W232015 2015 W23 NO FARMA 1171 165111
数据采用整洁格式,通常建议使用。但是,当我试图计算年度变化时,我会不断遇到一些问题(计算方法为:t1 / t0-1)。
期望的结果将是这样的:
res <- structure(list(DEPTO = c("FARMA", "FARMA", "FARMA", "FARMA",
"FARMA", "FARMA", "FARMA", "FARMA", "FARMA", "FARMA", "FARMA",
"FARMA", "FARMA", "FARMA", "FARMA", "FARMA", "FARMA", "FARMA",
"FARMA", "NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA",
"NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA",
"NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA", "NO FARMA",
"NO FARMA", "NO FARMA"), SEM = c("W21", "W22", "W23", "W24",
"W25", "W26", "W27", "W28", "W29", "W30", "W31", "W32", "W33",
"W34", "W35", "W36", "W37", "W38", "W39", "W21", "W22", "W23",
"W24", "W25", "W26", "W27", "W28", "W29", "W30", "W31", "W32",
"W33", "W34", "W35", "W36", "W37", "W38", "W39"), YEAR = c(2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016), YOYDIFF = c(0.103960396039604, -0.231166912850812,
0.0276399716513112, -0.0818120351588911, 0.0528169014084507,
0.128721541155867, -0.105511811023622, -0.227333333333333, -0.103495544893763,
0.017304189435337, -0.0495652173913044, -0.239972807613868, -0.258199581297976,
-0.204038997214485, 0.281081081081081, -0.255927475592748, -0.0130718954248366,
-0.0231449965963241, 0.4169921875, 0.0511182108626198, -0.167058823529412,
-0.0341588385994877, 0.00649953574744666, 0.05859375, 0.10594512195122,
0.0839469808541973, 0.250877192982456, 0.233502538071066, 0.0282738095238095,
-0.0850077279752705, 0.103135313531353, 0.121684867394696, -0.0892448512585813,
0.137738853503185, -0.489478114478115, NA, -0.206236711552091,
-0.105180533751962)), .Names = c("DEPTO", "SEM", "YEAR", "YOYDIFF"
), row.names = c(NA, 38L), class = "data.frame")
head(res)
DEPTO SEM YEAR YOYDIFF
1 FARMA W21 2016 0.10396040
2 FARMA W22 2016 -0.23116691
3 FARMA W23 2016 0.02763997
4 FARMA W24 2016 -0.08181204
5 FARMA W25 2016 0.05281690
6 FARMA W26 2016 0.12872154
所以,我按周(SEM)和分组变量(DEPTO)做了一年一年。
通常情况下,我会使用dcast
或某些过滤和inner_joins
与dplyr
的组合执行此操作,但我遇到速度问题,看起来真的很像为简单的计算而做这一切的效率太高。
有没有&#34;通用&#34;功能或包装,以加快速度,使其更直观?我有兴趣看看是否有人使用超快data.table
或任何类似的没有进行各种连接和操作来解决这个问题。
更新
结果(YOYDIFF)将通过例如除以变量来获得 UNIDADES DEPTO =&#34; FARMA&#34;和YEAR =&#34; 2016&#34;和SEM =&#34; W21&#34;由UNIDADES划分DEPTO =&#34; FARMA&#34;和YEAR =&#34; 2015&#34;和SEM =&#34; W21&#34;最后减去1。
更新2
df
是一个非常小的示例,仅用于显示结构,而res
是在Excel上执行某些计算的结果。但是df
答案 0 :(得分:1)
也许这并不完全是你所追求的,但这是我的猜测:
library(data.table)
setDT(df)
df[,list(YOYDIFF = UNIDADES[YEAR==2016] / UNIDADES[YEAR==2015]), by=c('SEM','DEPTO')]
给出了:
SEM DEPTO YOYDIFF
1: W21 FARMA 1.0465347
2: W21 NO FARMA 1.0878594
3: W22 FARMA 1.0598227
4: W22 NO FARMA 0.8784314
5: W23 FARMA 1.0283487
6: W23 NO FARMA 0.9735269
启动数据集(已修改为可用):
df <- structure(list(WK = c("W212015", "W212015", "W222015", "W222015",
"W232015", "W232015", "W372016", "W372016", "W382016", "W382016",
"W392016", "W392016"), YEAR = c(2015, 2015, 2015, 2015, 2015,
2015, 2016, 2016, 2016, 2016, 2016, 2016), SEM = c("W21", "W21",
"W22", "W22", "W23", "W23", "W21", "W21", "W22", "W22", "W23",
"W23"), DEPTO = c("FARMA", "NO FARMA", "FARMA", "NO FARMA", "FARMA",
"NO FARMA", "FARMA", "NO FARMA", "FARMA", "NO FARMA", "FARMA",
"NO FARMA"), UNIDADES = c(1010, 1252, 1354, 1275, 1411, 1171,
1057, 1362, 1435, 1120, 1451, 1140), VENTAS = c(128270, 172776,
173312, 204000, 189074, 165111, 130011, 226092, 231035, 181440,
220552, 197220)), .Names = c("WK", "YEAR", "SEM", "DEPTO", "UNIDADES",
"VENTAS"), row.names = c(NA, -12L), class = "data.frame")