日期在间隔和拉范围内

时间:2018-08-11 01:11:09

标签: r date intervals

我有两个df: @Query(value = "SELECT DISTINCT new CalendarView(" + "calendar.id, " + "calendar.Rid, " + "calendar.code, " + "calendar.calendarPeriod, " + "calendar.cutOffDateTime, " + "calendar.version ) " + "FROM ScaCalendar calendar " + " WHERE (calendar.calendarPeriod >= :calendarPeriod) " + " AND (calendar.cutOffDateTime >= :cutoffDateTime or :cutoffDateTime is null) " + " AND calendar.tenantCode = :tenantCode ") Page<CalendarView> getCalendarByCalendarPeriodCutoffDate( @Param("calendarPeriod") LocalDate calendarPeriod, @Param("cutoffDateTime") LocalDateTime cutoffDateTime, @Param("tenantCode") String tenantCode, Pageable pageable); public class CalendarView { private Long id; private long Rid; private String code; private LocalDate calendarPeriod; private LocalDateTime cutOffDateTime; private long version;

maindf

list有一个ID <- c(1, 1, 1, 1, 5, 5) SURVEY_DATE <- c("1997-08-01", "1998-08-20", "1998-11-20", "2000-12-13", "1998-05-02", "1998-12-25") SURVEY_DATE <- as.Date(SURVEY_DATE) maindf <- data.frame(ID, SURVEY_DATE) maindf ID <- c(1, 1, 1, 1, 5, 5) ASSIGN_DATE <- c(1997, 1998, 1999, 2000, 1997, 1998) TIME1 <- c("1997-07-23", "1998-11-17", "1999-12-15", "2000-12-11", "1998-04-07", "1998-12-06") TIME1 <- as.Date(TIME1) TIME2 <- c("1998-11-17", "1999-12-15", "2000-12-11", "2001-12-30", "1998-12-06", "1999-11-28") TIME2 <- as.Date(TIME2) list <- data.frame(ID, ASSIGN_DATE, TIME1, TIME2) list 字段。此字段需要签入maindf,以查看其是否在SURVEY_DATE的{​​{1}}和list之内。如果可以,我想将TIME1拉入TIME2

最终产品应类似于:

ID

我知道这与this postthis post非常相似,但是我在将字段ASSIGN_DATE拖过来时遇到了一些麻烦。

5 个答案:

答案 0 :(得分:2)

OP请求“ ASSIGN_DATE拉入maindf ”。

这可以通过更新联接来实现,该更新通过引用修改maindf

library(data.table)
setDT(maindf)[setDT(list), on = .(ID, SURVEY_DATE >= TIME1, SURVEY_DATE <= TIME2), 
       ASSIGN_DATE := i.ASSIGN_DATE][]
   ID SURVEY_DATE ASSIGN_DATE
1:  1  1997-08-01        1997
2:  1  1998-08-20        1997
3:  1  1998-11-20        1998
4:  1  2000-12-13        2000
5:  5  1998-05-02        1997
6:  5  1998-12-25        1998

答案 1 :(得分:0)

我缺乏独创性,无法提出for循环的更多创意,但是至少可以完成工作:

# recreate data (because I like lowercase)
maindf <- data.frame(
    id = c(1, 1, 1, 1, 5, 5), 
    sdate = as.Date(c("1997-08-01", "1998-08-20", "1998-11-20", "2000-12-13", "1998-05-02", "1998-12-25")))

otherdf <- data.frame(
    id = c(1, 1, 1, 1, 5, 5),
    adate = c(1997, 1998, 1999, 2000, 1997, 1998),
    time1 = as.Date(c("1997-07-23", "1998-11-17", "1999-12-15", "2000-12-11", "1998-04-07", "1998-12-06")),
    time2 = as.Date(c("1998-11-17", "1999-12-15", "2000-12-11", "2001-12-30", "1998-12-06", "1999-11-28"))
)

# my sad loop
maindf$adate <- NA
for(i in 1:nrow(maindf)) {
    c1 <- otherdf$id    == maindf[i, "id"]
    c2 <- otherdf$time1 <  maindf[i, "sdate"]
    c3 <- otherdf$time2 >  maindf[i, "sdate"]
    maindf[i, "adate"] <- otherdf[c1 & c2 & c3, "adate"]
}

答案 2 :(得分:0)

选项1:data.table方式

使用data.table::foverlaps

library(data.table)
setDT(maindf)[, `:=`(TIME1 = SURVEY_DATE, TIME2 = SURVEY_DATE)]
setDT(list)

# Interval-merge by TIME1 and TIME2
setkey(list, ID, TIME1, TIME2)
dt <- foverlaps(maindf, list)

# Clean up to reproduce expected output
dt[, .SD, .SDcols = c(names(maindf)[1:2], "ASSIGN_DATE")]
#   ID SURVEY_DATE ASSIGN_DATE
#1:  1  1997-08-01        1997
#2:  1  1998-08-20        1997
#3:  1  1998-11-20        1998
#4:  1  2000-12-13        2000
#5:  5  1998-05-02        1997
#6:  5  1998-12-25        1998

说明:foverlaps根据来自两个data.tables的时间间隔执行重叠连接。 foverlaps在每个data.table中需要一个开始和结束时间点,因此我们选择TIME1 = SURVEY_DATE作为开始点,TIME2 = SURVEY_DATA作为maindf的结束点。对于foverlaps的第二个参数,ID需要知道用于合并的键(此处为TIME1TIME2foverlaps),我们用{{ 1}}。


选项2:setkey / tidyverse方式

使用fuzzyjoin

fuzzyjoin::fuzzy_left_join

答案 3 :(得分:0)

data.table“非股权加入”获胜:

#re-create data as data.tables and with lowercase
library(data.table)
maindt <- data.table(
    id = c(1, 1, 1, 1, 5, 5), 
    sdate = as.Date(c("1997-08-01", "1998-08-20", "1998-11-20", "2000-12-13", "1998-05-02", "1998-12-25")))

otherdt <- data.table(
    id = c(1, 1, 1, 1, 5, 5),
    adate = c(1997, 1998, 1999, 2000, 1997, 1998),
    time1 = as.Date(c("1997-07-23", "1998-11-17", "1999-12-15", "2000-12-11", "1998-04-07", "1998-12-06")),
    time2 = as.Date(c("1998-11-17", "1999-12-15", "2000-12-11", "2001-12-30", "1998-12-06", "1999-11-28"))
)

#one-line merge
maindt[otherdt, on = .(id==id, cond1 = sdate > time1, cond3 = sdate < time2), .(id, sdate=x.sdate, adate), nomatch=0]

在我看来,非等号联接语法是一场噩梦,但我一直在努力应对dt1 [dt2]合并样式,所以我知道...

答案 4 :(得分:0)

使用完全外部联接和条件子集的基本R解决方案...

#full outer join 
foj <- merge(maindf, list, all = TRUE, by = "ID")
#conditional subset
df2 <- subset(foj, SURVEY_DATE >= TIME1 & SURVEY_DATE <= TIME2)

# > df2[, c("ID", "SURVEY_DATE", "ASSIGN_DATE")]
#     ID SURVEY_DATE       ASSIGN_DATE
# 1   1  1997-08-01        1997
# 5   1  1998-08-20        1997
# 10  1  1998-11-20        1998
# 16  1  2000-12-13        2000
# 17  5  1998-05-02        1997
# 20  5  1998-12-25        1998