我有一个像这样的大数据框:
groupvar <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "E", "E")
valuevar <- c( 1, 0.5, 0.5, 0.5, 1, 0.75, 0.75, 1, 0.8, 0.8, 0.8, 1, 0.9, 0.9, 1, 1.5)
myd <- data.frame (groupvar, valuevar)
groupvar valuevar
1 A 1.00
2 A 0.50
3 A 0.50
4 A 0.50
5 B 1.00
6 B 0.75
7 B 0.75
8 C 1.00
9 C 0.80
10 C 0.80
11 C 0.80
12 D 1.00
13 D 0.90
14 D 0.90
15 E 1.00
16 E 1.50
我想计算均值,但希望避免每个groupvar中第一个元素的第一个值。例如,1是给予每组中第一个值的值。例如,对于组“A”,平均值将基于0.5,0.5,0.5,从而避免第一个值1.
这就是我的想法:
meanfun <- function(x)sum(x)-x[1]/ length(x)
ddply (myd,"groupvar",meanfun)
Error in FUN(X[[1L]], ...) :
only defined on a data frame with all numeric variables
答案 0 :(得分:5)
这可能会有所帮助
> with(myd, tapply(valuevar, groupvar, function(x) mean(x[-1])))
A B C D E
0.50 0.75 0.80 0.90 1.50
使用aggregate
> aggregate(valuevar ~ groupvar, FUN=function(x) mean(x[-1]), data=myd)
groupvar valuevar
1 A 0.50
2 B 0.75
3 C 0.80
4 D 0.90
5 E 1.50
使用ddply
> library(plyr)
> ddply (myd, "groupvar", summarize, MeanVar=mean(valuevar[-1]))
groupvar MeanVar
1 A 0.50
2 B 0.75
3 C 0.80
4 D 0.90
5 E 1.50
答案 1 :(得分:1)
您可以按groupvar
拆分数据并应用均值函数。
groupvar <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "E", "E")
valuevar <- c( 1, 0.5, 0.5, 0.5, 1, 0.75, 0.75, 1, 0.8, 0.8, 0.8, 1, 0.9, 0.9, 1, 1.5)
myd <- data.frame (groupvar, valuevar)
lapply(split(myd, f=myd[, "groupvar"]), function(x) mean(x[-1,2]))
答案 2 :(得分:0)
我要做的是创建一个新的数据框,消除组var的第一个元素。然后我会采取对组var
的方法myd_rmFstElement <- myd[which(duplicated(myd$groupvar)), ]
myd_means <- aggregate(valuevar ~ groupvar, FUN=mean, myd_rmFstElement)