对于以下数据-我想计算每年每班的学生人数。
Class Students Gender Height Year_1999 Year_2000 Year_2001 Year_2002
1 Mark M 180 80 54 22 12
2 John M 234 0 59 32 62
1 Tom M 124 0 53 26 12
2 Jane F 180 80 54 22 0
3 Kim F 140 0 2 3 32
输出应为
Class Year_1999 Year_2000 Year_2001 Year_2002
1 1 2 2 2
2 1 2 2 1
3 0 1 1 1
我尝试了以下方法,但运气不佳
Number_obs = df %>%
group_by(class) %>%
summarise(count=n())
答案 0 :(得分:1)
我们可以在summarise_at
中使用dplyr
。按“类别”分组后,循环浏览matches
列名称中具有“ year” summarise_at
的列,获得sum
的值不等于0
library(dplyr)
df1 %>%
group_by(Class) %>%
summarise_at(vars(matches("Year")), list(~ sum(as.logical(.))))
# A tibble: 3 x 5
# Class Year_1999 Year_2000 Year_2001 Year_2002
# <int> <int> <int> <int> <int>
#1 1 1 2 2 2
#2 2 1 2 2 1
#3 3 0 1 1 1
或者我们可以gather
转换为“长”格式,对单列执行group_by
操作,然后spread
转换为“宽”格式
library(tidyr)
df1 %>%
gather(key, val, matches("Year")) %>%
group_by(Class, key) %>%
summarise(val = sum(val != 0)) %>%
spread(key, val)
或使用data.table
library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(as.logical(x))), .(Class), .SDcols = 5:8]
或将base R
与aggregate
一起使用
aggregate(.~ Class, df1[-(2:4)], function(x) sum(x != 0))
# Class Year_1999 Year_2000 Year_2001 Year_2002
#1 1 1 2 2 2
#2 2 1 2 2 1
#3 3 0 1 1 1
或使用rowsum
rowsum(+(!!df1[5:8]), df1$Class)
# Year_1999 Year_2000 Year_2001 Year_2002
#1 1 2 2 2
#2 1 2 2 1
#3 0 1 1 1
或使用colSums
t(sapply(split(as.data.frame(df1[5:8] != 0), df1$Class), colSums))
df1 <- structure(list(Class = c(1L, 2L, 1L, 2L, 3L), Students = c("Mark",
"John", "Tom", "Jane", "Kim"), Gender = c("M", "M", "M", "F",
"F"), Height = c(180L, 234L, 124L, 180L, 140L), Year_1999 = c(80L,
0L, 0L, 80L, 0L), Year_2000 = c(54L, 59L, 53L, 54L, 2L), Year_2001 = c(22L,
32L, 26L, 22L, 3L),
Year_2002 = c(12L, 62L, 12L, 0L, 32L)), class = "data.frame",
row.names = c(NA,
-5L))
答案 1 :(得分:1)
类似于 @akrun 的colSums
解决方案,使用by
。
do.call(rbind, by(df[5:8] > 0, df[1], colSums))
# Year_1999 Year_2000 Year_2001 Year_2002
# 1 1 2 2 2
# 2 1 2 2 1
# 3 0 1 1 1
或
Reduce(rbind, by(df[5:8] > 0, df[1], colSums))
# Year_1999 Year_2000 Year_2001 Year_2002
# init 1 2 2 2
# 1 2 2 1
# 0 1 1 1
do.call
更快。
答案 2 :(得分:0)
使用dplyr
,我们可以使用summarise_at
library(dplyr)
df %>%
group_by(Class) %>%
summarise_at(vars(starts_with("Year")), ~sum(. != 0))
# Class Year_1999 Year_2000 Year_2001 Year_2002
# <int> <int> <int> <int> <int>
#1 1 1 2 2 2
#2 2 1 2 2 1
#3 3 0 1 1 1