按组使用dplyr的表

时间:2018-10-09 12:37:26

标签: r dplyr frequency

这是我的复制示例。

HAVE <- data.frame(ID=c(1,2,3,4,5,6),
                   CLASS=c("A","A","B","B","C","C"),
                   AGE=c(14,13,11,12,14,14),
                   GENDER=c('MALE','MALE','FEMALE','MALE','FEMALE','FEMALE'))


WANT <- data.frame(COLUMN=c('AGE','GENDER = MALE'),
                   CLASSA=c(13.5,100),
                   CLASSB=c(11.5,50),
                   CLASSC=c(14,0))

本质上,目标是创建一个新的数据框,以显示数字变量的均值和因子变量的百分比。

This is my coding attempt
HAVE %>%
  count(HAVE[,]) %>%
  group_by(CLASS) %>%
  mutate(mean)

4 个答案:

答案 0 :(得分:2)

使用“表”包,您可以获得:

library(tables)
tabular(AGE*mean+GENDER*Percent("col") ~ CLASS,HAVE)
#                       CLASS         
#                       A     B    C  
#        AGE    mean     13.5 11.5  14
# GENDER FEMALE Percent   0.0 50.0 100
#        MALE   Percent 100.0 50.0   0

您只能子集MALE:

tabular(AGE*mean+GENDER*Percent("col") ~ CLASS,HAVE) [-2,]

#               CLASS        
#               A     B    C 
#  AGE  mean     13.5 11.5 14
#  MALE Percent 100.0 50.0  0

答案 1 :(得分:1)

类似

HAVE %>% 
    select(GENDER, AGE, CLASS) %>% 
    group_by(CLASS) %>% 
    summarise(AGE = mean(AGE), GENDER_MALE = sum(ifelse(GENDER == "MALE", 1, 0))*100/n()) %>% 
    t()

输出

            [,1]   [,2]   [,3]  
CLASS       "A"    "B"    "C"   
AGE         "13.5" "11.5" "14.0"
GENDER_MALE "100"  " 50"  "  0" 

答案 2 :(得分:1)

尝试基本的R解决方案

list.out <- 
  lapply(HAVE[-(1:2)], function(x){
        if(is.factor(x)) x <- x == levels(x)[2]
        aggregate(x, list(HAVE$CLASS), mean)$x})

out <- do.call(rbind, list.out)
colnames(out) <- unique(HAVE$CLASS)

out
#           A    B  C
# AGE    13.5 11.5 14
# GENDER  1.0  0.5  0

答案 3 :(得分:0)

这应该有效。

HAVE %>% 
  group_by(CLASS) %>% 
  summarise(mean_age = mean(AGE), percent_male = mean(GENDER == "MALE")*100) %>% 
  t()

您会得到:

             [,1]   [,2]   [,3]  
CLASS        "A"    "B"    "C"   
mean_age     "13.5" "11.5" "14.0"
percent_male "100"  " 50"  "  0" 

看起来像企鹅一样击败了我,但是我也会发布我的,因为代码中有一些小事情可能会更加精简-非常小。

如果您希望所有数值变量均取平均值,并且所有因子都设为百分比,那么我相信您可以nest()map()unnest()使用。也许有人可以为此提供代码。