为每个子组应用函数

时间:2016-01-13 00:01:01

标签: r

我想知道如何使用循环函数来计算

apply(table(data$people,data$event),2,function(x) mean(x[x>0])) 
每个级别的颜色

。我的意思是,我想为每个级别的颜色计算上述函数。

people <-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
event<-c("a","b","b","M","s","f","y","b","a","a","a","a","s","c","c","b","m","a")
Colour<-c("red","blue","green","pink","red","blue","grean","red","red","black","pink","blue","blue","green","blue","green","green","red")

data<-data.frame(people,event,Colour)

1 个答案:

答案 0 :(得分:1)

要对每个小组执行您的功能,首先让它成为一个功能:

your_function = function(data) {
    apply(table(data$people,data$event),2,function(x) mean(x[x>0]))
}

然后我们可以按颜色分割您的数据并将您的函数应用于每个子数据框:

dat_split = split(data, f = data$Colour)
results = lapply(dat_split, your_function)

results
# $black
#   a   b   c   f   m   M   s   y 
#   1 NaN NaN NaN NaN NaN NaN NaN 
#
# $blue
#   a   b   c   f   m   M   s   y 
#   1   1   1   1 NaN NaN   1 NaN 
#
# $grean
#   a   b   c   f   m   M   s   y 
# NaN NaN NaN NaN NaN NaN NaN   1 
# ...

就个人而言,我并不觉得这非常友好。 data.tabledplyr可以轻松地处理数据框的子集。我会从一开始就使用dplyr,如下所示:

library(dplyr)
data %>% group_by(people, Colour, event) %>%
    summarize(n = n()) %>%
    group_by(Colour, event) %>%
    summarize(mean = mean(n)) %>%
    tidyr::spread(key = event, value = mean)

# Source: local data frame [6 x 9]
#
#   Colour     a     b     c     f     m     M     s     y
#   (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1  black     1    NA    NA    NA    NA    NA    NA    NA
# 2   blue     1     1     1     1    NA    NA     1    NA
# 3  grean    NA    NA    NA    NA    NA    NA    NA     1
# ...