我有一个数据框。我们叫他dean_data
dean_data<-data.frame(date=c("23/06/2010", "23/06/2010", "23/06/2010", "29/07/2010", "29/07/2010", "29/07/2010"),
hb=c(60, 55, 50, 80, 60, 70),
pe=c(11.5, 11.2, 11.7, 8.5, 8, 8.25),
v.d=c(2.17, 2.65, 3.66, 2.78, 2.71, 2.68))
首先,我想计算一个参数&#34; n&#34;的平均值和sd。次,按日期(因子)改变行的位置。
我只能在做数据子集之前这样做,如下所示:
jun13<-subset(dean_data, date=="23/06/2010")
B = 1000
df<-matrix(NA,nrow=B)
for (b in (1:B)){
df[b]<-mean(sample(jun13$hb, replace=F)/(sample(jun13$pe, replace=F)*sample(jun13$v.d, replace=F)))
}
df
但我有几个日期(n = 30)...我想学习以自动方式完成此操作,将数据子集化为组并应用参数计算的重复。我期望结果是一张带有平均值和SD的表格。
答案 0 :(得分:0)
所以这是一次尝试:
dat = data_frame(date=c("23/06/2010", "23/06/2010", "23/06/2010",
"29/07/2010", "29/07/2010", "29/07/2010"),
hb=c(60, 55, 50, 80, 60, 70),
pe=c(11.5, 11.2, 11.7, 8.5, 8, 8.25),
v.d=c(2.17, 2.65, 3.66, 2.78, 2.71, 2.68))
dl = split(dat, dat$date) # split the data into groups, based on date
# create a helper function for readability
find_mean = function(hb, pe, v.d) {
sapply(1:1000, function(n) {
x = sample(hb, replace=F)
y = sample(pe, replace=F)
z = sample(v.d, replace=F)
mean(x / (y * z))
})
}
# loop through each subset of the data and find the mean
# output is a list of 1000x1 vectors
m = lapply(dl, function(df) {
find_mean(df$hb, df$pe, df$v.d)
})
# convert the list to a dataframe
df = as.data.frame(m)
names(df) = names(dl)
head(df)
23/06/2010 29/07/2010
1 1.760103 3.116560
2 1.770163 3.117997
3 1.767054 3.108493
4 1.784863 3.131723
5 1.799818 3.107862
6 1.770163 3.128762