我想在df1
(包括df1$id
,df1$datetime_interval
,df1$datetime_event
和df1$event
)中放入df2
中的数据(根据以下条件包括df2$id
,df2$datetime_event
)
如果 df1$id
和df2$id
匹配
和,如果df2$datetime_event
在df1$datetime_interval
内,
胜于我希望将df2$datetime_event
的数据复制到df1$datetime_event
,和df1列中> df1$event
中的字符串(例如“ yes”)。
如果不满足条件,我不希望有任何结果(NA)
所以:
df1
ID datetime_interval datetime_event event
1 2019-04-19 21:50:00 UTC--2019-04-20 21:31:00 UTC NA NA
1 2019-07-02 04:23:00 UTC--2019-07-02 08:51:00 UTC NA NA
2 2019-07-04 19:45:00 UTC--2019-07-05 00:30:00 UTC NA NA
3 2019-06-07 08:55:00 UTC--2019-06-07 14:43:00 UTC NA NA
3 2019-05-06 17:18:00 UTC--2019-05-06 23:18:00 UTC NA NA
6 2019-08-02 22:00:00 UTC--2019-08-04 03:10:00 UTC NA NA
df2
ID datetime_event
1 2019-04-19 21:55:00
3 2019-05-06 21:23:00
5 2019-07-04 19:45:00
6 2019-05-06 17:18:00
6 2019-08-03 10:10:00
我已经尝试了一些方法,但是却没有像我想要的那样成功。我仍然缺少一些步骤,我不知道如何继续进行下去。这是我到目前为止所拥有的:
for(i in seq_along(df1$id)){
for(j in seq_along(df2$id)){
ifelse(df2$id[j] == df1$id[i]) {
ifelse(df2$datetime_event[j] %within% df1$datetime_interval[i] == TRUE){
df1$datetime_event <- df2$datetime_ic_corr[j]
}
}
}
}
我想要的结果是这样
df1
ID datetime_event datetime_event event
1 2019-04-19 21:50:00 UTC--2019-04-20 21:31:00 UTC 2019-04-19 21:55:00 yes
1 2019-07-02 04:23:00 UTC--2019-07-02 08:51:00 UTC NA NA
2 2019-07-04 19:45:00 UTC--2019-07-05 00:30:00 UTC NA NA
3 2019-06-07 08:55:00 UTC--2019-06-07 14:43:00 UTC NA NA
3 2019-05-06 17:18:00 UTC--2019-05-06 23:18:00 UTC 2019-05-06 21:23:00 yes
6 2019-08-02 22:00:00 UTC--2019-08-04 03:10:00 UTC 2019-08-03 10:10:00 yes
在此先感谢您提供所有新输入!因为我被卡住了...
dput(df1)
structure(list(ID = c(1, 1, 2, 3, 3, 6), datetime_interval = c("2019-04-19 21:50:00 UTC--2019-04-20 21:31:00 UTC",
"2019-07-02 04:23:00 UTC--2019-07-02 08:51:00 UTC", "2019-07-04 19:45:00 UTC--2019-07-05 00:30:00 UTC",
"2019-06-07 08:55:00 UTC--2019-06-07 14:43:00 UTC", "2019-05-06 17:18:00 UTC--2019-05-06 23:18:00 UTC",
"2019-08-02 22:00:00 UTC--2019-08-04 03:10:00 UTC"), datetime_event = c("NA",
"NA", "NA", "NA", "NA", "NA"), event = c("NA", "NA", "NA", "NA",
"NA", "NA")), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
dput(df2)
structure(list(ID = c(1, 3, 5, 6, 6), datetime_event = c("2019-04-19 21:55:00 UTC",
"2019-05-06 21:23:00 UTC", "2019-07-04 19:45:00 UTC", "2019-05-06 17:18:00 UTC",
"2019-08-03 10:10:00 UTC")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
答案 0 :(得分:0)
棘手的问题。我认为这可行:
library(dplyr)
library(tidyr)
# convert datetime_interval to datetime class start and end columns
# and add row IDs
df1 = df1 %>%
separate(datetime_interval, into = c("start", "end"), sep = "--") %>%
mutate_at(vars(start, end), as.POSIXct) %>%
select(-datetime_event, -event) %>%
mutate(row_id = row_number())
# convert datetime event to datetime class
df2 = df2 %>%
mutate(datetime_event = as.POSIXct(datetime_event))
# join and filter
df1 %>% left_join(df2, by = "ID") %>%
mutate(
datetime_event = ifelse(datetime_event >= start & datetime_event <= end, datetime_event, NA),
event = ifelse(is.na(datetime_event), NA, "yes")
) %>%
arrange(row_id, datetime_event) %>%
group_by(row_id) %>%
slice(1)
# # A tibble: 6 x 6
# # Groups: row_id [6]
# ID start end row_id datetime_event event
# <dbl> <dttm> <dttm> <int> <dbl> <chr>
# 1 1 2019-04-19 21:50:00 2019-04-20 21:31:00 1 1555725300 yes
# 2 1 2019-07-02 04:23:00 2019-07-02 08:51:00 2 NA NA
# 3 2 2019-07-04 19:45:00 2019-07-05 00:30:00 3 NA NA
# 4 3 2019-06-07 08:55:00 2019-06-07 14:43:00 4 NA NA
# 5 3 2019-05-06 17:18:00 2019-05-06 23:18:00 5 1557192180 yes
# 6 6 2019-08-02 22:00:00 2019-08-04 03:10:00 6 1564841400 yes