Question

我发现一些stackoverflow问题非常相似但答案不是我想要的（Loop through columns and apply ddply，Aggregate / summarize multiple variables per group (i.e. sum, mean, etc)）

主要区别在于答案以不使用for循环（也不适用）但使用聚合（或类似）的方式简化问题。但是我有大量的代码可以顺利地完成各种摘要，统计和绘图，所以我真正想做的是让循环或函数正常工作。我目前面临的问题是从循环中存储为q的列名到实际列（get（）对我不起作用）。见下文。

我的数据集类似于以下但有40个功能：

Subject <- c(rep(1, times = 6), rep(2, times = 6))
GroupOfInterest <- c(letters[rep(1:3, times = 4)])
Feature1 <- sample(1:20, 12, replace = T)
Feature2 <- sample(400:500, 12, replace = T)
Feature3 <- sample(1:5, 12, replace = T)
df.main <- data.frame(Subject,GroupOfInterest, Feature1, Feature2, 
Feature3, stringsAsFactors = FALSE)

到目前为止，我的尝试使用了for循环：

Feat <- c(colnames(df.main[3:5]))    
for (q in Feat){
df_sum = ddply(df.main, ~GroupOfInterest + Subject,
            summarise, q =mean(get(q)))
  }

我希望提供类似下面的输出（虽然我意识到它现在需要单独的合并功能）：

但是，根据我的方式，我要么得到一个错误（“get（q）中的错误：第一个参数无效”），要么平均一个特征的所有值，而不是按Subject和GroupOfInterest进行分组。

我也尝试使用列表和lapply但遇到了类似的困难。

从我收集的内容来看，我的问题在于ddply期待Feature1。但是，如果我循环，我要么提供“Feature1”（字符串）或（1,14,14,16,17 ...），它们不再是主题和组分组所需的数据帧的一部分。

非常感谢您提供解决此问题的任何帮助，并教我如何运作。

Answer 1

根据评论编辑;需要包括as.character（。）

你能使用summarise_at吗？辅助函数vars(contains(...))？

df.main %>% 
    group_by(Subject, GroupOfInterest) %>% 
    summarise_at(vars(contains("Feature")), funs(mean(as.numeric(as.character(.)))))

Answer 2

上面给出了dlyr解决方案，但公平的是data.table one

DT <- setDT(df.main)
DT[,lapply(.SD,function(x){mean(as.numeric(as.character(x)))}),
.SDcols = names(DT)[grepl("Feature",names(DT))], by = .(Subject,GroupOfInterest)]

   Subject GroupOfInterest Feature1 Feature2 Feature3
1:       1               a      6.5    459.5      2.0
2:       1               b     11.0    480.5      4.0
3:       1               c      9.5    453.0      4.5
4:       2               a      3.5    483.0      1.5
5:       2               b      8.0    449.0      3.5
6:       2               c     11.5    424.0      1.0

Answer 3

提到

OP使用简单的for-loop进行数据转换。我了解有许多其他优化方法可以解决这个问题，但为了尊重OP所需，我尝试使用基于for-loop的解决方案。我使用dplyr因为plyr现在已经过时了。

library(dplyr)
Subject <- c(rep(1, times = 6), rep(2, times = 6))
GroupOfInterest <- c(letters[rep(1:3, times = 4)])
Feature1 <- sample(1:20, 12, replace = T)
Feature2 <- sample(400:500, 12, replace = T)
Feature3 <- sample(1:5, 12, replace = T)
#small change in the way data.frame is created 
df.main <- data.frame(Subject,GroupOfInterest, Feature1, Feature2, 
 Feature3, stringsAsFactors = FALSE)

Feat <- c(colnames(df.main[3:5])) 

# Ready with Key columns on which grouping is done
resultdf <- unique(select(df.main, Subject, GroupOfInterest))
#> resultdf
#  Subject GroupOfInterest
#1       1               a
#2       1               b
#3       1               c
#7       2               a
#8       2               b
#9       2               c


#For loop for each column
for(q in Feat){
  summean <- paste0('mean(', q, ')')
  summ_name <- paste0(q) #Name of the column to store sum
  df_sum <- df.main %>% 
     group_by(Subject, GroupOfInterest) %>%
    summarise_(.dots = setNames(summean, summ_name)) 
  #merge the result of new sum column in resultdf
  resultdf <- merge(resultdf, df_sum, by = c("Subject", "GroupOfInterest"))
}

# Final result
#> resultdf
#  Subject GroupOfInterest Feature1 Feature2 Feature3
#1       1               a      6.5    473.0      3.5
#2       1               b      4.5    437.0      2.0
#3       1               c     12.0    415.5      3.5
#4       2               a     10.0    437.5      3.0
#5       2               b      3.0    447.0      4.5
#6       2               c      6.0    462.0      2.5

如何使用for循环在多列上使用ddply？

3 个答案: