我有一个这样的数据框:
data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
如何按学年总结n
?每个学年的学期应从12月到次年8月。例如,我想在每个学年总结n
。手动重构不是一种选择,因为有太多的值,有时甚至是丢失的值。
最终,重构应该如下:
date a.y.
"2012-05-01" 2011/2012
"2012-08-01" 2011/2012
"2012-12-01" 2012/2013
"2013-05-01" 2012/2013
"2013-08-01" 2012/2013
"2013-12-01" 2013/2014
"2014-05-01" 2013/2014
"2014-12-01" 2014/2015
"2015-05-01" 2014/2015
"2015-08-01" 2014/2015
"2015-12-01" 2015/2016
"2016-05-01" 2015/2016
"2016-08-01" 2015/2016
"2016-12-01" 2016/2017
正如您所注意到的,日期遵循类似的模式,但每个学年可能会有不同的日期。
答案 0 :(得分:1)
如果我在看到12月份的录取后立即阅读此内容,我们就会改变学年。如果这是真的,那么以下代码将起作用。
library(data.table)
library(lubridate)
df = data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
df$AcademicYear = cumsum(month(df$date) == 12)
setDT(df)
df[ , .(Sum = sum(n)), by = .(AcademicYear)]
AcademicYear Sum
1: 0 3
2: 1 12
3: 2 13
4: 3 27
5: 4 36
6: 5 14
修改强>
重构你可以做这样的事情。它会在AcademicYear中查找一个月,然后根据月份,它知道添加或减去一年并将其粘贴在一起。然后,该列只需要重命名并如上所述求和。
df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"),
ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"),
paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]
> df
date n AcademicYear AcademicYear2
1: 2012-05-01 1 0 2011/2012
2: 2012-08-01 2 0 2011/2012
3: 2012-12-01 3 1 2012/2013
4: 2013-05-01 4 1 2012/2013
5: 2013-08-01 5 1 2012/2013
6: 2013-12-01 6 2 2013/2014
7: 2014-05-01 7 2 2013/2014
8: 2014-12-01 8 3 2014/2015
9: 2015-05-01 9 3 2014/2015
10: 2015-08-01 10 3 2014/2015
11: 2015-12-01 11 4 2015/2016
12: 2016-05-01 12 4 2015/2016
13: 2016-08-01 13 4 2015/2016
14: 2016-12-01 14 5 2016/2017
修改2
决定将所有代码放在一起。这可以为您提供您正在寻找的最终结果。
library(data.table)
library(lubridate)
df = data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
setDT(df)
df$AcademicYear = cumsum(month(df$date) == 12)
df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"),
ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"),
paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]
df = df[ , .(Sum = sum(n)), by = .(AcademicYear = AcademicYear2)]
> df
AcademicYear Sum
1: 2011/2012 3
2: 2012/2013 12
3: 2013/2014 13
4: 2014/2015 27
5: 2015/2016 36
6: 2016/2017 14
答案 1 :(得分:0)
不确定您想要哪些日期与哪些日期,但您可以使用dplyr和mutate与一系列if else语句。它很慢但是有效。
df <- data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
df <- mutate(df, term=ifelse(date >= as.Date("2012-05-01") & date <= as.Date("2012-08-01"), "1",
ifelse(date >= as.Date("2012-12-01") & date <= as.Date("2013-05-01"), "2",
ifelse(date >= as.Date("2013-12-01") & date <= as.Date("2014-12-01"), "3",
ifelse(date >= as.Date("2015-08-01") & date <= as.Date("2016-08-01"), "4",
"other")))))