我有一些数据。 ID和日期,我正在尝试为学期创建一个新字段。
df:
id date
1 20160822
2 20170109
3 20170828
4 20170925
5 20180108
6 20180402
7 20160711
8 20150831
9 20160111
10 20160502
11 20160829
12 20170109
13 20170501
我还有一个semester
表:
start end season_year
20120801 20121222 Fall-2012
20121223 20130123 Winter-2013
20130124 20130523 Spring-2013
20130524 20130805 Summer-2013
20130806 20131228 Fall-2013
20131229 20140122 Winter-2014
20140123 20140522 Spring-2014
20140523 20140804 Summer-2014
20140805 20141227 Fall-2014
20141228 20150128 Winter-2015
20150129 20150528 Spring-2015
20150529 20150803 Summer-2015
20150804 20151226 Fall-2015
20151227 20160127 Winter-2016
20160128 20160526 Spring-2016
20160527 20160801 Summer-2016
20160802 20161224 Fall-2016
20161225 20170125 Winter-2017
20170126 20170525 Spring-2017
20170526 20170807 Summer-2017
20170808 20171230 Fall-2017
20171231 20180124 Winter-2018
20180125 20180524 Spring-2018
20180525 20180806 Summer-2018
20180807 20181222 Fall-2018
20181223 20190123 Winter-2019
20190124 20190523 Spring-2019
20190524 20180804 Summer-2019
如果df
在df$date
和semester$start
之间,我想在semester$end
中创建一个新字段,然后放置各自的值{{1} }在semester$season_year
我试图查看lubridate软件包是否可以提供帮助,但这似乎对计算更有用
我看到了this question,它似乎与我想要的最接近,但是,为了使事情变得更复杂,并非我们所有的学期都为六个月
答案 0 :(得分:2)
这行吗?
library(lubridate)
semester$start <- ymd(semester$start)
semester$end <- ymd(semester$end)
df$date <- ymd(df$date)
LU <- Map(`:`, semester$start, semester$end)
LU <- data.frame(value = unlist(LU),
index = rep(seq_along(LU), lapply(LU, length)))
df$semester <- semester$season_year[LU$index[match(df$date, LU$value)]]
答案 1 :(得分:1)
使用data.table
和lubridate
软件包的library(data.table)
setDT(df)
setDT(semester)
df[,date:=as.IDate(as.character(date), format = "%Y%m%d")]
semester[,':='(start = as.IDate(as.character(start), format = "%Y%m%d"),
end=as.IDate(as.character(end), format = "%Y%m%d"))]
df[semester, on=.(date >= start, date <= end), season_year := i.season_year]
df
# id date season_year
# 1: 1 2016-08-22 Fall-2016
# 2: 2 2017-01-09 Winter-2017
# 3: 3 2017-08-28 Fall-2017
# 4: 4 2017-09-25 Fall-2017
# 5: 5 2018-01-08 Winter-2018
# 6: 6 2018-04-02 Spring-2018
# 7: 7 2016-07-11 Summer-2016
# 8: 8 2015-08-31 Fall-2015
# 9: 9 2016-01-11 Winter-2016
# 10: 10 2016-05-02 Spring-2016
# 11: 11 2016-08-29 Fall-2016
# 12: 12 2017-01-09 Winter-2017
# 13: 13 2017-05-01 Spring-2017
更新联接的解决方案可以是:
df <- read.table(text="
id date
1 20160822
2 20170109
3 20170828
4 20170925
5 20180108
6 20180402
7 20160711
8 20150831
9 20160111
10 20160502
11 20160829
12 20170109
13 20170501",
header = TRUE, stringsAsFactors = FALSE)
semester <- read.table(text="
start end season_year
20120801 20121222 Fall-2012
20121223 20130123 Winter-2013
20130124 20130523 Spring-2013
20130524 20130805 Summer-2013
20130806 20131228 Fall-2013
20131229 20140122 Winter-2014
20140123 20140522 Spring-2014
20140523 20140804 Summer-2014
20140805 20141227 Fall-2014
20141228 20150128 Winter-2015
20150129 20150528 Spring-2015
20150529 20150803 Summer-2015
20150804 20151226 Fall-2015
20151227 20160127 Winter-2016
20160128 20160526 Spring-2016
20160527 20160801 Summer-2016
20160802 20161224 Fall-2016
20161225 20170125 Winter-2017
20170126 20170525 Spring-2017
20170526 20170807 Summer-2017
20170808 20171230 Fall-2017
20171231 20180124 Winter-2018
20180125 20180524 Spring-2018
20180525 20180806 Summer-2018
20180807 20181222 Fall-2018
20181223 20190123 Winter-2019
20190124 20190523 Spring-2019
20190524 20180804 Summer-2019",
header = TRUE, stringsAsFactors = FALSE)
数据:
karma.conf.js