Question

我有一个循环，我想摆脱，我只是不能看到它也是如此。假设我有一个数据框：

tmp = data.frame(Gender = rep(c("Male", "Female"), each = 6), 
                 Ethnicity = rep(c("White", "Asian", "Other"), 4),
                 Score = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

然后，我想计算性别和种族列中每个级别的平均值，这将给出：

$Female
[1] 9.5

$Male
[1] 3.5

$Asian
[1] 6.5

$Other
[1] 7.5

$White
[1] 5.5

这很容易做到，但我不想使用循环 - 我要求速度。所以我目前有以下内容：

for(i in c("Gender", "Ethnicity"))
    print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))

显然，这会使用一个循环而且是我被困住的地方。

可能有一个功能已经做了我不知道的这种事情。我看过汇总但我不认为这就是我想要的。

Answer 1

sapply() names tmp除Score外，您可以by()，然后使用aggregate()（或> sapply(setdiff(names(tmp),"Score"),function(xx)by(tmp$Score,tmp[,xx],mean)) $Gender tmp[, xx]: Female [1] 9.5 ------------------------------------------------------------ tmp[, xx]: Male [1] 3.5 $Ethnicity tmp[, xx]: Asian [1] 6.5 ------------------------------------------------------------ tmp[, xx]: Other [1] 7.5 ------------------------------------------------------------ tmp[, xx]: White [1] 5.5）< / p>

{{1}}

然而，这在内部使用循环，所以它不会加速很多......

Answer 2

使用dplyr

 library(dplyr)
 library(tidyr)
 tmp[,1:2] <- lapply(tmp[,1:2], as.character)
 tmp %>% 
     gather(Var1, Var2, Gender:Ethnicity) %>%
     unite(Var, Var1, Var2) %>% 
     group_by(Var) %>% 
     summarise(Score=mean(Score))

  #              Var Score
  #1 Ethnicity_Asian   6.5
  #2 Ethnicity_Other   7.5
  #3 Ethnicity_White   5.5
  #4   Gender_Female   9.5
  #5     Gender_Male   3.5

Answer 3

您可以嵌套应用函数。

sapply(c("Gender", "Ethnicity"),
       function(i) {
         print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))
       })

Answer 4

您可以使用以下代码：

c(tapply(tmp$Score,tmp$Gender,mean),tapply(tmp$Score,tmp$Ethnicity,mean))

Answer 5

尝试使用reshape2包。

require(reshape2)

#demo
melted<-melt(tmp)
casted.gender<-dcast(melted,Gender~variable,mean) #for mean of each gender
casted.eth<-dcast(melted,Ethnicity~variable,mean) #for mean of each ethnicity

#now, combining to do for all variables at once
variables<-colnames(tmp)[-length(colnames(tmp))]

casting<-function(var.name){
    return(dcast(melted,melted[,var.name]~melted$variable,mean))
}

lapply(variables, FUN=casting)

输出：

[[1]]
  melted[, var.name] Score
1             Female   9.5
2               Male   3.5

[[2]]
  melted[, var.name] Score
1              Asian   6.5
2              Other   7.5
3              White   5.5

Answer 6

你应该重新考虑你正在产生的输出。包含所有种族和性别变量的列表可能不是绘制，分析或显示数据的最佳方式。你可能最好不要使用tapply

来分解和编写两行代码而不是那一行代码

tapply(tmp$Score, tmp$Gender, mean)
tapply(tmp$Score, tmp$Ethnicity, mean)

或aggregate

aggregate(Score ~ Gender, tmp, mean)
aggregate(Score ~ Ethnicity, tmp, mean)

然后，也许你可能想看看你的互动，即使你建议聚合不能做你真正想要的。

with(tmp, tapply(Score, list(Gender, Ethnicity), mean))
aggregate(Score ~ Gender + Ethnicity, tmp, mean)

这些不仅可以让您更好地分离和呈现变量所呈现的基本思想，而且您的R命令更具表现力，并且反映了首先单独编码这些变量的数据意图。

如果您真正的任务是使用多个变量，那么这些变量中的任何一个都可以放入循环中，但我建议您仍然希望输出不是单个列表，而是作为矢量或data.frames的列表。< / p>

在lapply中删除循环

6 个答案: