R在两个时间点之间的给定日期创建数据框中的新行

时间:2017-06-02 15:05:57

标签: r date

我有一个df,包含某个过程的开始日期和结束日期。复制数据:

ID1 <- c("AUT","AUT","AUT","BEL","BEL","BEL")
start_date <- c("2008-12-02", "2013-12-16", "2016-05-17", "2007-06-10", "2007-12-21", "2008-03-20")
start_date <- as.Date(start_date, "%Y-%m-%d")
end_date <- c("2013-12-15", "2016-05-16", "2017-11-30", "2007-12-20", "2008-03-19", "2008-12-29")
end_date <- as.Date(end_date, "%Y-%m-%d")
ID2 <- 1:6
df <- data.frame(ID1, as.character(start_date), as.character(end_date), ID2)

看起来像:

   ID1       start_date     end_date       ID2
1  AUT      2008-12-02     2013-12-15      1     
2  AUT      2013-12-16     2016-05-16      2     
3  AUT      2016-05-17     2017-11-30      3     
4  BEL      2007-06-10     2007-12-20      4     
5  BEL      2007-12-21     2008-03-19      5     
6  BEL      2008-03-20     2008-12-29      6

我想在给定时间段内为此df添加新的列和行:列应为year,其值取决于是否在一年的第一天(20XX-01) -01)在进程边界内部或外部。我想看到的是:

    ID1       start_date     end_date       ID2    year
1   AUT      2008-12-02     2013-12-15      1     2009
2   AUT      2008-12-02     2013-12-15      1     2010
2   AUT      2008-12-02     2013-12-15      1     2011
4   AUT      2008-12-02     2013-12-15      1     2012
5   AUT      2008-12-02     2013-12-15      1     2013
6   AUT      2013-12-16     2016-05-16      2     2014
7   AUT      2013-12-16     2016-05-16      2     2015
8   AUT      2013-12-16     2016-05-16      2     2016
9   AUT      2016-05-17     2017-11-30      3     2017
10  BEL      2007-06-10     2007-12-20      4     NA
11  BEL      2007-12-21     2008-03-19      5     2008
12  BEL      2008-03-20     2008-12-29      6     NA

编辑:较小的建议更改以提高清晰度

1 个答案:

答案 0 :(得分:1)

所以我稍微修改了初始代码

library(lubridate)

library(data.table)

ID1 <- c("AUT","AUT","AUT","BEL","BEL","BEL")

start_date <- ymd("2008-12-02", "2013-12-16", "2016-05-17", "2007-06-10", "2007-12-21", "2008-03-20")

end_date <- ymd("2013-12-15", "2016-05-16", "2017-11-30", "2007-12-20", "2008-03-19", "2008-12-29")

ID2 <- 1:6

df <- data.table(ID1, start_date, end_date, ID2)

df[,yearDiff:=year(end_date)-year(start_date)]

df<-df[,cbind(.SD,year=(year(start_date)+1):year(end_date)),by="ID2"]

df[,dateInterval:=interval(df$start_date,df$end_date)]

df[,IsYearWithinDate:=(ymd(paste0(year,"01","01",sep="-"))%within% dateInterval)]

df[,.(ID1,start_date,end_date,ID2,year,IsYearWithinDate)]

导致(通过df[,.(ID1,start_date,end_date,ID2,year):

   ID1 start_date   end_date ID2 year
 1: AUT 2008-12-02 2013-12-15   1 2009
 2: AUT 2008-12-02 2013-12-15   1 2010
 3: AUT 2008-12-02 2013-12-15   1 2011
 4: AUT 2008-12-02 2013-12-15   1 2012
 5: AUT 2008-12-02 2013-12-15   1 2013
 6: AUT 2013-12-16 2016-05-16   2 2014
 7: AUT 2013-12-16 2016-05-16   2 2015
 8: AUT 2013-12-16 2016-05-16   2 2016
 9: AUT 2016-05-17 2017-11-30   3 2017
10: BEL 2007-06-10 2007-12-20   4 2008
11: BEL 2007-06-10 2007-12-20   4 2007
12: BEL 2007-12-21 2008-03-19   5 2008
13: BEL 2008-03-20 2008-12-29   6 2009
14: BEL 2008-03-20 2008-12-29   6 2008