R:在R中创建一个新列以根据两个日期确定学期

时间:2018-06-26 18:51:36

标签: r date

我有一些数据。 ID和日期,我正在尝试为学期创建一个新字段。

df:

id  date
1   20160822
2   20170109
3   20170828
4   20170925
5   20180108
6   20180402
7   20160711
8   20150831
9   20160111
10  20160502
11  20160829
12  20170109
13  20170501

我还有一个semester表:

start       end         season_year
20120801    20121222    Fall-2012
20121223    20130123    Winter-2013
20130124    20130523    Spring-2013
20130524    20130805    Summer-2013
20130806    20131228    Fall-2013
20131229    20140122    Winter-2014
20140123    20140522    Spring-2014
20140523    20140804    Summer-2014
20140805    20141227    Fall-2014
20141228    20150128    Winter-2015
20150129    20150528    Spring-2015
20150529    20150803    Summer-2015
20150804    20151226    Fall-2015
20151227    20160127    Winter-2016
20160128    20160526    Spring-2016
20160527    20160801    Summer-2016
20160802    20161224    Fall-2016
20161225    20170125    Winter-2017
20170126    20170525    Spring-2017
20170526    20170807    Summer-2017
20170808    20171230    Fall-2017
20171231    20180124    Winter-2018
20180125    20180524    Spring-2018
20180525    20180806    Summer-2018
20180807    20181222    Fall-2018
20181223    20190123    Winter-2019
20190124    20190523    Spring-2019
20190524    20180804    Summer-2019

如果dfdf$datesemester$start之间,我想在semester$end中创建一个新字段,然后放置各自的值{{1} }在semester$season_year

我试图查看lubridate软件包是否可以提供帮助,但这似乎对计算更有用

我看到了this question,它似乎与我想要的最接近,但是,为了使事情变得更复杂,并非我们所有的学期都为六个月

2 个答案:

答案 0 :(得分:2)

这行吗?

library(lubridate)

semester$start <- ymd(semester$start)
semester$end <- ymd(semester$end)
df$date <- ymd(df$date)

LU <-  Map(`:`, semester$start, semester$end)
LU <- data.frame(value = unlist(LU),
                 index = rep(seq_along(LU), lapply(LU, length)))


df$semester <- semester$season_year[LU$index[match(df$date, LU$value)]]

答案 1 :(得分:1)

使用data.tablelubridate软件包的library(data.table) setDT(df) setDT(semester) df[,date:=as.IDate(as.character(date), format = "%Y%m%d")] semester[,':='(start = as.IDate(as.character(start), format = "%Y%m%d"), end=as.IDate(as.character(end), format = "%Y%m%d"))] df[semester, on=.(date >= start, date <= end), season_year := i.season_year] df # id date season_year # 1: 1 2016-08-22 Fall-2016 # 2: 2 2017-01-09 Winter-2017 # 3: 3 2017-08-28 Fall-2017 # 4: 4 2017-09-25 Fall-2017 # 5: 5 2018-01-08 Winter-2018 # 6: 6 2018-04-02 Spring-2018 # 7: 7 2016-07-11 Summer-2016 # 8: 8 2015-08-31 Fall-2015 # 9: 9 2016-01-11 Winter-2016 # 10: 10 2016-05-02 Spring-2016 # 11: 11 2016-08-29 Fall-2016 # 12: 12 2017-01-09 Winter-2017 # 13: 13 2017-05-01 Spring-2017 更新联接的解决方案可以是:

df <- read.table(text="
id  date
1   20160822
2   20170109
3   20170828
4   20170925
5   20180108
6   20180402
7   20160711
8   20150831
9   20160111
10  20160502
11  20160829
12  20170109
13  20170501",
header = TRUE, stringsAsFactors = FALSE)


semester <- read.table(text="
start       end         season_year
20120801    20121222    Fall-2012
20121223    20130123    Winter-2013
20130124    20130523    Spring-2013
20130524    20130805    Summer-2013
20130806    20131228    Fall-2013
20131229    20140122    Winter-2014
20140123    20140522    Spring-2014
20140523    20140804    Summer-2014
20140805    20141227    Fall-2014
20141228    20150128    Winter-2015
20150129    20150528    Spring-2015
20150529    20150803    Summer-2015
20150804    20151226    Fall-2015
20151227    20160127    Winter-2016
20160128    20160526    Spring-2016
20160527    20160801    Summer-2016
20160802    20161224    Fall-2016
20161225    20170125    Winter-2017
20170126    20170525    Spring-2017
20170526    20170807    Summer-2017
20170808    20171230    Fall-2017
20171231    20180124    Winter-2018
20180125    20180524    Spring-2018
20180525    20180806    Summer-2018
20180807    20181222    Fall-2018
20181223    20190123    Winter-2019
20190124    20190523    Spring-2019
20190524    20180804    Summer-2019",
header = TRUE, stringsAsFactors = FALSE)

数据:

karma.conf.js