我有两个df: @Query(value = "SELECT DISTINCT new CalendarView("
+ "calendar.id, "
+ "calendar.Rid, "
+ "calendar.code, "
+ "calendar.calendarPeriod, "
+ "calendar.cutOffDateTime, "
+ "calendar.version ) "
+ "FROM ScaCalendar calendar "
+ " WHERE (calendar.calendarPeriod >= :calendarPeriod) "
+ " AND (calendar.cutOffDateTime >= :cutoffDateTime or :cutoffDateTime is null) "
+ " AND calendar.tenantCode = :tenantCode ")
Page<CalendarView> getCalendarByCalendarPeriodCutoffDate(
@Param("calendarPeriod") LocalDate calendarPeriod,
@Param("cutoffDateTime") LocalDateTime cutoffDateTime,
@Param("tenantCode") String tenantCode,
Pageable pageable);
和public class CalendarView {
private Long id;
private long Rid;
private String code;
private LocalDate calendarPeriod;
private LocalDateTime cutOffDateTime;
private long version;
。
maindf
list
有一个ID <- c(1, 1, 1, 1, 5, 5)
SURVEY_DATE <- c("1997-08-01", "1998-08-20", "1998-11-20", "2000-12-13", "1998-05-02", "1998-12-25")
SURVEY_DATE <- as.Date(SURVEY_DATE)
maindf <- data.frame(ID, SURVEY_DATE)
maindf
ID <- c(1, 1, 1, 1, 5, 5)
ASSIGN_DATE <- c(1997, 1998, 1999, 2000, 1997, 1998)
TIME1 <- c("1997-07-23", "1998-11-17", "1999-12-15", "2000-12-11", "1998-04-07", "1998-12-06")
TIME1 <- as.Date(TIME1)
TIME2 <- c("1998-11-17", "1999-12-15", "2000-12-11", "2001-12-30", "1998-12-06", "1999-11-28")
TIME2 <- as.Date(TIME2)
list <- data.frame(ID, ASSIGN_DATE, TIME1, TIME2)
list
字段。此字段需要签入maindf
,以查看其是否在SURVEY_DATE
的{{1}}和list
之内。如果可以,我想将TIME1
拉入TIME2
。
最终产品应类似于:
ID
答案 0 :(得分:2)
OP请求“ 将ASSIGN_DATE
拉入maindf
”。
这可以通过更新联接来实现,该更新通过引用修改maindf
:
library(data.table)
setDT(maindf)[setDT(list), on = .(ID, SURVEY_DATE >= TIME1, SURVEY_DATE <= TIME2),
ASSIGN_DATE := i.ASSIGN_DATE][]
ID SURVEY_DATE ASSIGN_DATE 1: 1 1997-08-01 1997 2: 1 1998-08-20 1997 3: 1 1998-11-20 1998 4: 1 2000-12-13 2000 5: 5 1998-05-02 1997 6: 5 1998-12-25 1998
答案 1 :(得分:0)
我缺乏独创性,无法提出for
循环的更多创意,但是至少可以完成工作:
# recreate data (because I like lowercase)
maindf <- data.frame(
id = c(1, 1, 1, 1, 5, 5),
sdate = as.Date(c("1997-08-01", "1998-08-20", "1998-11-20", "2000-12-13", "1998-05-02", "1998-12-25")))
otherdf <- data.frame(
id = c(1, 1, 1, 1, 5, 5),
adate = c(1997, 1998, 1999, 2000, 1997, 1998),
time1 = as.Date(c("1997-07-23", "1998-11-17", "1999-12-15", "2000-12-11", "1998-04-07", "1998-12-06")),
time2 = as.Date(c("1998-11-17", "1999-12-15", "2000-12-11", "2001-12-30", "1998-12-06", "1999-11-28"))
)
# my sad loop
maindf$adate <- NA
for(i in 1:nrow(maindf)) {
c1 <- otherdf$id == maindf[i, "id"]
c2 <- otherdf$time1 < maindf[i, "sdate"]
c3 <- otherdf$time2 > maindf[i, "sdate"]
maindf[i, "adate"] <- otherdf[c1 & c2 & c3, "adate"]
}
答案 2 :(得分:0)
选项1:data.table
方式
使用data.table::foverlaps
library(data.table)
setDT(maindf)[, `:=`(TIME1 = SURVEY_DATE, TIME2 = SURVEY_DATE)]
setDT(list)
# Interval-merge by TIME1 and TIME2
setkey(list, ID, TIME1, TIME2)
dt <- foverlaps(maindf, list)
# Clean up to reproduce expected output
dt[, .SD, .SDcols = c(names(maindf)[1:2], "ASSIGN_DATE")]
# ID SURVEY_DATE ASSIGN_DATE
#1: 1 1997-08-01 1997
#2: 1 1998-08-20 1997
#3: 1 1998-11-20 1998
#4: 1 2000-12-13 2000
#5: 5 1998-05-02 1997
#6: 5 1998-12-25 1998
说明:foverlaps
根据来自两个data.tables的时间间隔执行重叠连接。 foverlaps
在每个data.table
中需要一个开始和结束时间点,因此我们选择TIME1 = SURVEY_DATE
作为开始点,TIME2 = SURVEY_DATA
作为maindf
的结束点。对于foverlaps
的第二个参数,ID
需要知道用于合并的键(此处为TIME1
,TIME2
和foverlaps
),我们用{{ 1}}。
选项2:setkey
/ tidyverse
方式
使用fuzzyjoin
fuzzyjoin::fuzzy_left_join
答案 3 :(得分:0)
data.table“非股权加入”获胜:
#re-create data as data.tables and with lowercase
library(data.table)
maindt <- data.table(
id = c(1, 1, 1, 1, 5, 5),
sdate = as.Date(c("1997-08-01", "1998-08-20", "1998-11-20", "2000-12-13", "1998-05-02", "1998-12-25")))
otherdt <- data.table(
id = c(1, 1, 1, 1, 5, 5),
adate = c(1997, 1998, 1999, 2000, 1997, 1998),
time1 = as.Date(c("1997-07-23", "1998-11-17", "1999-12-15", "2000-12-11", "1998-04-07", "1998-12-06")),
time2 = as.Date(c("1998-11-17", "1999-12-15", "2000-12-11", "2001-12-30", "1998-12-06", "1999-11-28"))
)
#one-line merge
maindt[otherdt, on = .(id==id, cond1 = sdate > time1, cond3 = sdate < time2), .(id, sdate=x.sdate, adate), nomatch=0]
在我看来,非等号联接语法是一场噩梦,但我一直在努力应对dt1 [dt2]合并样式,所以我知道...
答案 4 :(得分:0)
使用完全外部联接和条件子集的基本R解决方案...
#full outer join
foj <- merge(maindf, list, all = TRUE, by = "ID")
#conditional subset
df2 <- subset(foj, SURVEY_DATE >= TIME1 & SURVEY_DATE <= TIME2)
# > df2[, c("ID", "SURVEY_DATE", "ASSIGN_DATE")]
# ID SURVEY_DATE ASSIGN_DATE
# 1 1 1997-08-01 1997
# 5 1 1998-08-20 1997
# 10 1 1998-11-20 1998
# 16 1 2000-12-13 2000
# 17 5 1998-05-02 1997
# 20 5 1998-12-25 1998