我想根据分组变量的现有范围用行填充我的Panel数据集。
为了更好地理解,我将使用示例数据集:
> df<-data.frame(Student=c(1, 1, 2), Year=c(1,2,2), Type=c("Test","Exam","Test"),Points=c(80,140,30))
> df
Student Year Type Points
1 1 1 Test 80
2 1 2 Exam 140
3 2 2 Test 30
我想为每个学生提供每年基于积分范围的两次总结。转换后看起来应该像这样:
> df2<-data.frame(Student=c(1, 1, 1,1,2,2,2,2), Year=c(1,1,2,2,1,1,2,2), PointRange=c("0_100","100_200","0_100","100_200","0_100","100_200","0_100","100_200"), n_tests=c(1,0,0,0,0,0,1,0), n_exams=c(0,0,0,1,0,0,0,0))
> df2
Student Year PointRange n_tests n_exams
1 1 1 0_100 1 0
2 1 1 100_200 0 0
3 1 2 0_100 0 0
4 1 2 100_200 0 1
5 2 1 0_100 0 0
6 2 1 100_200 0 0
7 2 2 0_100 1 0
8 2 2 100_200 0 0
我已经尝试使用dplyr-package进行以下操作:
df %>% mutate(PointRange = case_when(Points >= 0 & Points <= 100 ~ 1, Points >= 101 & Points <= 200 ~ 2)) %>%
+ group_by(Student, Year, PointRange) %>%
+ summarise(n_tests = sum(Type == "Test"),
+ n_exams = sum(Type=="Exam"))
# A tibble: 3 x 5
# Groups: Student, Year [?]
Student Year PointRange n_tests n_exams
<dbl> <dbl> <dbl> <int> <int>
1 1 1 1 1 0
2 1 2 2 0 1
3 2 2 1 1 0
然后缺少的是五行,因此对于每个用户,我每年都有两个Point-Range。我该如何解决?
答案 0 :(得分:0)
您可以使用cut
创建范围,然后使用dplyr::complete()
创建学生,年份和范围的所有组合-
result <- df %>%
mutate(PointRange = cut(Points, breaks = c(0, 100, 200), right= F)) %>%
complete(Student, Year, PointRange) %>%
group_by(Student, Year, PointRange) %>%
summarize(
n_tests = sum(Type == "Test", na.rm = T),
n_exams = sum(Type == "Exam", na.rm = T)
)
# A tibble: 8 x 5
# Groups: Student, Year [?]
Student Year PointRange n_tests n_exams
<dbl> <dbl> <fct> <int> <int>
1 1.00 1.00 [0,100) 1 0
2 1.00 1.00 [100,200) 0 0
3 1.00 2.00 [0,100) 0 0
4 1.00 2.00 [100,200) 0 1
5 2.00 1.00 [0,100) 0 0
6 2.00 1.00 [100,200) 0 0
7 2.00 2.00 [0,100) 1 0
8 2.00 2.00 [100,200) 0 0