这是我的复制示例。
HAVE <- data.frame(ID=c(1,2,3,4,5,6),
CLASS=c("A","A","B","B","C","C"),
AGE=c(14,13,11,12,14,14),
GENDER=c('MALE','MALE','FEMALE','MALE','FEMALE','FEMALE'))
WANT <- data.frame(COLUMN=c('AGE','GENDER = MALE'),
CLASSA=c(13.5,100),
CLASSB=c(11.5,50),
CLASSC=c(14,0))
本质上,目标是创建一个新的数据框,以显示数字变量的均值和因子变量的百分比。
This is my coding attempt
HAVE %>%
count(HAVE[,]) %>%
group_by(CLASS) %>%
mutate(mean)
答案 0 :(得分:2)
使用“表”包,您可以获得:
library(tables)
tabular(AGE*mean+GENDER*Percent("col") ~ CLASS,HAVE)
# CLASS
# A B C
# AGE mean 13.5 11.5 14
# GENDER FEMALE Percent 0.0 50.0 100
# MALE Percent 100.0 50.0 0
您只能子集MALE:
tabular(AGE*mean+GENDER*Percent("col") ~ CLASS,HAVE) [-2,]
# CLASS
# A B C
# AGE mean 13.5 11.5 14
# MALE Percent 100.0 50.0 0
答案 1 :(得分:1)
类似
HAVE %>%
select(GENDER, AGE, CLASS) %>%
group_by(CLASS) %>%
summarise(AGE = mean(AGE), GENDER_MALE = sum(ifelse(GENDER == "MALE", 1, 0))*100/n()) %>%
t()
输出
[,1] [,2] [,3]
CLASS "A" "B" "C"
AGE "13.5" "11.5" "14.0"
GENDER_MALE "100" " 50" " 0"
答案 2 :(得分:1)
尝试基本的R解决方案
list.out <-
lapply(HAVE[-(1:2)], function(x){
if(is.factor(x)) x <- x == levels(x)[2]
aggregate(x, list(HAVE$CLASS), mean)$x})
out <- do.call(rbind, list.out)
colnames(out) <- unique(HAVE$CLASS)
out
# A B C
# AGE 13.5 11.5 14
# GENDER 1.0 0.5 0
答案 3 :(得分:0)
这应该有效。
HAVE %>%
group_by(CLASS) %>%
summarise(mean_age = mean(AGE), percent_male = mean(GENDER == "MALE")*100) %>%
t()
您会得到:
[,1] [,2] [,3]
CLASS "A" "B" "C"
mean_age "13.5" "11.5" "14.0"
percent_male "100" " 50" " 0"
看起来像企鹅一样击败了我,但是我也会发布我的,因为代码中有一些小事情可能会更加精简-非常小。
如果您希望所有数值变量均取平均值,并且所有因子都设为百分比,那么我相信您可以nest()
,map()
和unnest()
使用。也许有人可以为此提供代码。