根据范围填充面板数据集中的缺失行

时间:2018-10-26 15:28:27

标签: r dplyr panel complete

我想根据分组变量的现有范围用行填充我的Panel数据集。

为了更好地理解,我将使用示例数据集:

> df<-data.frame(Student=c(1, 1, 2), Year=c(1,2,2), Type=c("Test","Exam","Test"),Points=c(80,140,30))
> df
  Student Year Type Points
1       1    1 Test     80
2       1    2 Exam    140
3       2    2 Test     30

我想为每个学生提供每年基于积分范围的两次总结。转换后看起来应该像这样:

> df2<-data.frame(Student=c(1, 1, 1,1,2,2,2,2), Year=c(1,1,2,2,1,1,2,2), PointRange=c("0_100","100_200","0_100","100_200","0_100","100_200","0_100","100_200"), n_tests=c(1,0,0,0,0,0,1,0), n_exams=c(0,0,0,1,0,0,0,0))
> df2
  Student Year PointRange n_tests n_exams
1       1    1      0_100       1       0
2       1    1    100_200       0       0
3       1    2      0_100       0       0
4       1    2    100_200       0       1
5       2    1      0_100       0       0
6       2    1    100_200       0       0
7       2    2      0_100       1       0
8       2    2    100_200       0       0

我已经尝试使用dplyr-package进行以下操作:

df %>% mutate(PointRange = case_when(Points >= 0 & Points <= 100 ~ 1, Points >= 101 & Points <= 200 ~ 2)) %>%
+         group_by(Student, Year, PointRange) %>%
+   summarise(n_tests = sum(Type == "Test"),
+             n_exams = sum(Type=="Exam"))
# A tibble: 3 x 5
# Groups:   Student, Year [?]
  Student  Year PointRange n_tests n_exams
    <dbl> <dbl>      <dbl>   <int>   <int>
1       1     1          1       1       0
2       1     2          2       0       1
3       2     2          1       1       0

然后缺少的是五行,因此对于每个用户,我每年都有两个Point-Range。我该如何解决?

1 个答案:

答案 0 :(得分:0)

您可以使用cut创建范围,然后使用dplyr::complete()创建学生,年份和范围的所有组合-

result <- df %>%
  mutate(PointRange = cut(Points, breaks = c(0, 100, 200), right=  F)) %>%
  complete(Student, Year, PointRange) %>%
  group_by(Student, Year, PointRange) %>%
  summarize(
    n_tests = sum(Type == "Test", na.rm = T),
    n_exams = sum(Type == "Exam", na.rm = T)
  )

# A tibble: 8 x 5
# Groups:   Student, Year [?]
  Student  Year PointRange n_tests n_exams
    <dbl> <dbl> <fct>        <int>   <int>
1    1.00  1.00 [0,100)          1       0
2    1.00  1.00 [100,200)        0       0
3    1.00  2.00 [0,100)          0       0
4    1.00  2.00 [100,200)        0       1
5    2.00  1.00 [0,100)          0       0
6    2.00  1.00 [100,200)        0       0
7    2.00  2.00 [0,100)          1       0
8    2.00  2.00 [100,200)        0       0