如何按学年分组?

时间:2017-05-15 20:26:42

标签: r date

我有一个这样的数据框:

  data.frame(
        date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
        n= 1:14
   )

如何按学年总结n?每个学年的学期应从12月到次年8月。例如,我想在每个学年总结n。手动重构不是一种选择,因为有太多的值,有时甚至是丢失的值。

最终,重构应该如下:

  date         a.y.

"2012-05-01" 2011/2012
"2012-08-01" 2011/2012

"2012-12-01" 2012/2013
"2013-05-01" 2012/2013
"2013-08-01" 2012/2013

"2013-12-01" 2013/2014
"2014-05-01" 2013/2014

"2014-12-01" 2014/2015
"2015-05-01" 2014/2015
"2015-08-01" 2014/2015

"2015-12-01" 2015/2016
"2016-05-01" 2015/2016
"2016-08-01" 2015/2016

"2016-12-01" 2016/2017

正如您所注意到的,日期遵循类似的模式,但每个学年可能会有不同的日期。

2 个答案:

答案 0 :(得分:1)

如果我在看到12月份的录取后立即阅读此内容,我们就会改变学年。如果这是真的,那么以下代码将起作用。

library(data.table)
library(lubridate)
df =   data.frame(
  date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
                    16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
  n= 1:14
)

df$AcademicYear = cumsum(month(df$date) == 12)
setDT(df)
df[ , .(Sum = sum(n)), by = .(AcademicYear)]

  AcademicYear Sum
1:            0   3
2:            1  12
3:            2  13
4:            3  27
5:            4  36
6:            5  14

修改

重构你可以做这样的事情。它会在AcademicYear中查找一个月,然后根据月份,它知道添加或减去一年并将其粘贴在一起。然后,该列只需要重命名并如上所述求和。

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
                               ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
                                      paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]

> df
          date  n AcademicYear AcademicYear2
 1: 2012-05-01  1            0     2011/2012
 2: 2012-08-01  2            0     2011/2012
 3: 2012-12-01  3            1     2012/2013
 4: 2013-05-01  4            1     2012/2013
 5: 2013-08-01  5            1     2012/2013
 6: 2013-12-01  6            2     2013/2014
 7: 2014-05-01  7            2     2013/2014
 8: 2014-12-01  8            3     2014/2015
 9: 2015-05-01  9            3     2014/2015
10: 2015-08-01 10            3     2014/2015
11: 2015-12-01 11            4     2015/2016
12: 2016-05-01 12            4     2015/2016
13: 2016-08-01 13            4     2015/2016
14: 2016-12-01 14            5     2016/2017

修改2

决定将所有代码放在一起。这可以为您提供您正在寻找的最终结果。

library(data.table)
library(lubridate)
df =   data.frame(
  date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
                    16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
  n= 1:14
)

setDT(df)
df$AcademicYear = cumsum(month(df$date) == 12)

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
                               ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
                                      paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]


df = df[ , .(Sum = sum(n)), by = .(AcademicYear = AcademicYear2)]

> df
   AcademicYear Sum
1:    2011/2012   3
2:    2012/2013  12
3:    2013/2014  13
4:    2014/2015  27
5:    2015/2016  36
6:    2016/2017  14

答案 1 :(得分:0)

不确定您想要哪些日期与哪些日期,但您可以使用dplyr和mutate与一系列if else语句。它很慢但是有效。

df <- data.frame(
  date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
                    16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
  n= 1:14
)

df <- mutate(df, term=ifelse(date >= as.Date("2012-05-01")  & date <= as.Date("2012-08-01"),  "1",
        ifelse(date >= as.Date("2012-12-01")  & date <= as.Date("2013-05-01"),  "2",
          ifelse(date >= as.Date("2013-12-01")  & date <= as.Date("2014-12-01"),  "3",
         ifelse(date >= as.Date("2015-08-01")  & date <= as.Date("2016-08-01"),  "4",
           "other")))))