如何根据多种条件将数据从一个数据帧复制到另一个数据帧

时间:2020-03-26 14:39:43

标签: r dataframe datetime

我想在df1(包括df1$iddf1$datetime_intervaldf1$datetime_eventdf1$event)中放入df2中的数据(根据以下条件包括df2$iddf2$datetime_event

如果 df1$iddf2$id匹配

,如果df2$datetime_eventdf1$datetime_interval内,

胜于我希望将df2$datetime_event的数据复制到df1$datetime_event和df1列中> df1$event中的字符串(例如“ yes”)。

如果不满足条件,我不希望有任何结果(NA)

所以:

df1
ID        datetime_interval                                  datetime_event    event
1       2019-04-19 21:50:00 UTC--2019-04-20 21:31:00 UTC           NA            NA
1       2019-07-02 04:23:00 UTC--2019-07-02 08:51:00 UTC           NA            NA
2       2019-07-04 19:45:00 UTC--2019-07-05 00:30:00 UTC           NA            NA
3       2019-06-07 08:55:00 UTC--2019-06-07 14:43:00 UTC           NA            NA
3       2019-05-06 17:18:00 UTC--2019-05-06 23:18:00 UTC           NA            NA
6       2019-08-02 22:00:00 UTC--2019-08-04 03:10:00 UTC           NA            NA
df2
ID        datetime_event                                  
1       2019-04-19 21:55:00        
3       2019-05-06 21:23:00 
5       2019-07-04 19:45:00 
6       2019-05-06 17:18:00
6       2019-08-03 10:10:00            

我已经尝试了一些方法,但是却没有像我想要的那样成功。我仍然缺少一些步骤,我不知道如何继续进行下去。这是我到目前为止所拥有的:

for(i in seq_along(df1$id)){
  for(j in seq_along(df2$id)){
    ifelse(df2$id[j] ==  df1$id[i]) {
       ifelse(df2$datetime_event[j] %within% df1$datetime_interval[i] == TRUE){
        df1$datetime_event <- df2$datetime_ic_corr[j]
       }
     }
   }
 } 

我想要的结果是这样

df1
ID        datetime_event                                      datetime_event          event
1       2019-04-19 21:50:00 UTC--2019-04-20 21:31:00 UTC    2019-04-19 21:55:00       yes
1       2019-07-02 04:23:00 UTC--2019-07-02 08:51:00 UTC           NA                  NA
2       2019-07-04 19:45:00 UTC--2019-07-05 00:30:00 UTC           NA                  NA
3       2019-06-07 08:55:00 UTC--2019-06-07 14:43:00 UTC           NA                  NA
3       2019-05-06 17:18:00 UTC--2019-05-06 23:18:00 UTC    2019-05-06 21:23:00        yes
6       2019-08-02 22:00:00 UTC--2019-08-04 03:10:00 UTC    2019-08-03 10:10:00        yes

在此先感谢您提供所有新输入!因为我被卡住了...

dput(df1)
structure(list(ID = c(1, 1, 2, 3, 3, 6), datetime_interval = c("2019-04-19 21:50:00 UTC--2019-04-20 21:31:00 UTC", 
"2019-07-02 04:23:00 UTC--2019-07-02 08:51:00 UTC", "2019-07-04 19:45:00 UTC--2019-07-05 00:30:00 UTC", 
"2019-06-07 08:55:00 UTC--2019-06-07 14:43:00 UTC", "2019-05-06 17:18:00 UTC--2019-05-06 23:18:00 UTC", 
"2019-08-02 22:00:00 UTC--2019-08-04 03:10:00 UTC"), datetime_event = c("NA", 
"NA", "NA", "NA", "NA", "NA"), event = c("NA", "NA", "NA", "NA", 
"NA", "NA")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
"data.frame"))

dput(df2)
structure(list(ID = c(1, 3, 5, 6, 6), datetime_event = c("2019-04-19 21:55:00 UTC", 
"2019-05-06 21:23:00 UTC", "2019-07-04 19:45:00 UTC", "2019-05-06 17:18:00 UTC", 
"2019-08-03 10:10:00 UTC")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

1 个答案:

答案 0 :(得分:0)

棘手的问题。我认为这可行:

library(dplyr)
library(tidyr)

# convert datetime_interval to datetime class start and end columns
# and add row IDs
df1 = df1 %>% 
  separate(datetime_interval, into = c("start", "end"), sep = "--") %>%
  mutate_at(vars(start, end), as.POSIXct) %>%
  select(-datetime_event, -event) %>%
  mutate(row_id = row_number())

# convert datetime event to datetime class
df2 = df2 %>%
  mutate(datetime_event = as.POSIXct(datetime_event))

# join and filter
df1 %>% left_join(df2, by = "ID") %>%
  mutate(
    datetime_event = ifelse(datetime_event >= start & datetime_event <= end, datetime_event, NA),
    event = ifelse(is.na(datetime_event), NA, "yes")
  ) %>%
  arrange(row_id, datetime_event) %>%
  group_by(row_id) %>%
  slice(1)
# # A tibble: 6 x 6
# # Groups:   row_id [6]
#      ID start               end                 row_id datetime_event event
#   <dbl> <dttm>              <dttm>               <int>          <dbl> <chr>
# 1     1 2019-04-19 21:50:00 2019-04-20 21:31:00      1     1555725300 yes  
# 2     1 2019-07-02 04:23:00 2019-07-02 08:51:00      2             NA NA   
# 3     2 2019-07-04 19:45:00 2019-07-05 00:30:00      3             NA NA   
# 4     3 2019-06-07 08:55:00 2019-06-07 14:43:00      4             NA NA   
# 5     3 2019-05-06 17:18:00 2019-05-06 23:18:00      5     1557192180 yes  
# 6     6 2019-08-02 22:00:00 2019-08-04 03:10:00      6     1564841400 yes