我有一个数据框,列出了参加活动的学生总数(Stu)和每组学生人数(ID):
ID Stu Sub
(int) (int) (int)
1 101 80 NA
2 102 130 NA
3 103 10 NA
4 104 210 20
5 105 180 NA
6 106 150 NA
我想知道参与活动(Sub> 0)或不参与活动的大小范围(> 400,> 200,> 100,> 0)的群组数量(Sub is.na)
output <- structure(list(ID = c(101L, 102L, 103L, 104L, 105L, 106L),
Stu = c(80L, 130L, 10L, 210L, 180L, 150L),
Sub = c(NA,NA, NA, 20L, NA, NA)),
.Names = c("ID", "Stu", "Sub"),
class = c("tbl_df", "data.frame"),
row.names = c(NA, -6L))
temp <- output %>%
mutate(Stu = ifelse(Stu >= 400, 400,
ifelse(Stu >= 200, 200,
ifelse(Stu >= 100, 100, 0
)))) %>%
group_by(Stu) %>%
summarise(entries = length(!is.na(Sub)),
noentries = length(is.na(Sub)))
结果应该是:
Stu entries noentries
(dbl) (int) (int)
1 0 0 2
2 100 0 3
3 200 1 0
但我明白了:
Stu entries noentries
(dbl) (int) (int)
1 0 2 2
2 100 3 3
3 200 1 1
如何使总结中的长度函数像countif一样?
答案 0 :(得分:3)
summarise
需要一个值,因此sum
代替length
完成工作:
output %>%
mutate(Stu = ifelse(Stu >= 400, 400,
ifelse(Stu >= 200, 200,
ifelse(Stu >= 100, 100, 0
)))) %>%
group_by(Stu) %>%
summarise(entries = sum(!is.na(Sub)),
noentries = sum(is.na(Sub)))
Source: local data frame [3 x 3]
Stu entries noentries
(dbl) (int) (int)
1 0 0 2
2 100 0 3
3 200 1 0
答案 1 :(得分:3)
遵循@ eipi10提供的相同想法,但切入count()
而不是group_by() %>% tally()
,并显示tidyr::spread
可以模仿reshape2::dcast
:
output %>%
count(Sub = ifelse(is.na(Sub), 'No Entries', 'Entires'),
Stu = cut(Stu, c(0, 100, 200, 400, +Inf), labels = c(0, 100, 200, 400))) %>%
tidyr::spread(Sub, n, fill = 0)
答案 2 :(得分:1)
另一种选择是按Stu
和Sub
进行分组,但要做到这一点,我们需要先重新编码Sub
和Stu
的值以匹配输出分组我们想要。我们还使用cut
而不是嵌套ifelse
来设置Stu
中的值分隔符:
library(reshape2)
output %>%
group_by(Sub=ifelse(is.na(Sub), "No Entries", "Entries"),
Stu=cut(Stu, c(0,100,200,400,Inf), labels=c(0,100,200,400))) %>%
tally %>%
dcast(Stu ~ Sub, fill=0)
Stu Entries No Entries 1 0 0 2 2 100 0 3 3 200 1 0