我正在尝试按资产类别计算资产加权回报。对于我的生活,我无法弄清楚如何使用aggregate命令来做到这一点。
我的数据框架如下所示
dat <- data.frame(company, fundname, assetclass, return, assets)
我正在尝试做一些事情(不要复制,这是错误的):
aggregate(dat, list(dat$assetclass), weighted.mean, w=(dat$return, dat$assets))
答案 0 :(得分:13)
对于初学者,w=(dat$return, dat$assets))
是语法错误。
plyr让这更容易:
> set.seed(42) # fix seed so that you get the same results
> dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE),
+ return=rnorm(20), assets=1e7+1e7*runif(20))
> library(plyr)
> ddply(dat, .(assetclass), # so by asset class invoke following function
+ function(x) data.frame(wret=weighted.mean(x$return, x$assets)))
assetclass wret
1 A -2.27292
2 B -0.19969
3 C 0.46448
4 D -0.71354
5 E 0.55354
>
答案 1 :(得分:8)
data.table
解决方案将比plyr
library(data.table)
DT <- data.table(dat)
DT[,list(wret = weighted.mean(return,assets)),by=assetclass]
## assetclass wret
## 1: A -0.05445455
## 2: E -0.56614312
## 3: D -0.43007547
## 4: B 0.69799701
## 5: C 0.08850954
答案 2 :(得分:6)
使用聚合也可以轻松完成。它有助于记住加权平均值的替代方程式。
rw <- dat$return * dat$assets
dat1 <- aggregate(rw ~ assetclass, data = dat, sum)
datw <- aggregate(assets ~ assetclass, data = dat, sum)
dat1$weighted.return <- dat1$rw / datw$assets
答案 3 :(得分:0)
最近发布的collapse
软件包通过提供完整的Fast Statistical Functions集合来在C ++内部执行分组和加权计算,从而提供了一种快速解决此问题和类似问题的方法(使用加权中位数,众数等):
library(collapse)
dat <- data.frame(assetclass = sample(LETTERS[1:5], 20, replace = TRUE),
return = rnorm(20), assets = 1e7+1e7*runif(20))
# Using collap() function with fmean, which supports weights: (by default weights are aggregated using the sum, which is prevented using keep.w = FALSE)
collap(dat, return ~ assetclass, fmean, w = ~ assets, keep.w = FALSE)
## assetclass return
## 1 A -0.4667822
## 2 B 0.5417719
## 3 C -0.8810705
## 4 D 0.6301396
## 5 E 0.3101673
# Can also use a dplyr-like workflow: (use keep.w = FALSE to omit sum.assets)
library(magrittr)
dat %>% fgroup_by(assetclass) %>% fmean(assets)
## assetclass sum.assets return
## 1 A 80683025 -0.4667822
## 2 B 27411156 0.5417719
## 3 C 22627377 -0.8810705
## 4 D 146355734 0.6301396
## 5 E 25463042 0.3101673
# Or simply a direct computation yielding a vector:
dat %$% fmean(return, assetclass, assets)
## A B C D E
## -0.4667822 0.5417719 -0.8810705 0.6301396 0.3101673