在ddply中旋转变量

时间:2014-10-14 18:19:11

标签: r variables dataframe plyr

我试图根据唯一值从数据框中的列中获取均值。因此,在此示例中尝试根据列a中的唯一值来获取列b和列c的均值。我认为。(a)会使它通过a中的唯一值计算(它给出a的唯一值),但它只给出整个列b或c的平均值。

df2<-data.frame(a=seq(1:5),b=c(1:10), c=c(11:20))    
simVars <- c("b", "c")
for ( var in simVars ){
print(var)

dat = ddply(df2, .(a), summarize, mean_val = mean(df2[[var]])) ## my script
assign(var, dat)

}
c

a mean_val
1     15.5
2     15.5
3     15.5
4     15.5
5     15.5

如何根据列a?

中的唯一值对列进行平均处理

感谢

1 个答案:

答案 0 :(得分:0)

您不需要循环。只需在b的一次调用中计算cddply的均值,就会为a的每个值单独计算均值。并且,正如@Gregor所说,您无需在mean()内重新指定数据框名称:

ddply(df2, .(a), summarise, 
      mean_b=mean(b),
      mean_c=mean(c))

  a mean_b mean_c
1 1    3.5   13.5
2 2    4.5   14.5
3 3    5.5   15.5
4 4    6.5   16.5
5 5    7.5   17.5

更新:为每列资料获取单独的数据框:

# Add a few additional columns to the data frame
df2 = data.frame(a=seq(1:5),b=c(1:10), c=c(11:20), d=c(21:30), e=c(31:40))   

# New data frame with means by each level of column a
library(dplyr)
dfmeans = df2 %>%
  group_by(a) %>%
  summarise_each(funs(mean))

# Separate each column of means into a separate data frame and store it in a list:
means.list = lapply(names(dfmeans)[-1], function(x) {
  cbind(dfmeans[,"a"], dfmeans[,x])
})

means.list

[[1]]
  a   b
1 1 3.5
2 2 4.5
3 3 5.5
4 4 6.5
5 5 7.5

[[2]]
  a    c
1 1 13.5
2 2 14.5
3 3 15.5
4 4 16.5
5 5 17.5

[[3]]
  a    d
1 1 23.5
2 2 24.5
3 3 25.5
4 4 26.5
5 5 27.5

[[4]]
  a    e
1 1 33.5
2 2 34.5
3 3 35.5
4 4 36.5
5 5 37.5