我正在尝试汇总我的数据,以说明老师安排的课程数量。
基本上我的数据如下:
Id | Subject
123| algebra
123| geometry
123| algebra II
456| calc
456| calc
789| geometry
789| geometry
789| calc
and I need it to look like this:
Id | Subject count
123| 3
456| 1
789| 2
I have no idea where to start because I don't want it to simply count the number of courses they teach, I want the DIFFERENT courses. Please help!
答案 0 :(得分:0)
我们可以按'Id'分组,并在n_distinct
中用summarise
获得不同的'Subject'计数
library(dplyr)
df1 %>%
group_by(Id) %>%
summarise(Subject_Count = n_distinct(Subject))
# A tibble: 3 x 2
# Id Subject_Count
# <int> <int>
#1 123 3
#2 456 1
#3 789 2
或者使用data.table
转换为{Id}分组的data.table
(setDT(df1)
),并用uniqueN
来获得不同的计数
library(data.table)
setDT(df1)[,.(Subject_Count = uniqueN(Subject)), by = Id]
df1 <- structure(list(Id = c(123L, 123L, 123L, 456L, 456L, 789L, 789L,
789L), Subject = c("algebra", "geometry", "algebra II", "calc",
"calc", "geometry", "geometry", "calc")), class = "data.frame",
row.names = c(NA,
-8L))