使用两个条件时未使用dplyr联接数据框

时间:2020-07-21 15:28:45

标签: r join time dplyr tidyverse

所以我有一个像这样的数据框:

df1 <- structure(list(Date = c("05/14/2019", "05/14/2019", "05/16/2019", 
"05/17/2019", "05/18/2019", "05/18/2019", "05/20/2019", "05/25/2019", 
"05/26/2019"), TIME = c("10:30 AM", "11:15 AM", "11:00 PM", " 7:36 AM", 
"11:15 AM", " 7:00 PM", " 2:45 PM", " 3:02 AM", "12:40 PM")), row.names = 355:363, class = "data.frame")

当然是子集,但我想从这里加入另一个df的信息:

df2 <- structure(list(Date = c("05/14/2019", "05/14/2019", "05/16/2019", 
"05/17/2019", "05/18/2019", "05/18/2019", "05/20/2019", "05/25/2019", 
"05/26/2019", "05/31/2019"), TIME = c("10:30 AM", "11:15 AM", 
"11:00 PM", "7:36 AM", "11:15 AM", "7:00 PM", "2:45 PM", "3:02 AM", 
"12:40 PM", "2:10 PM"), Event_ = c("71", "68", "03", "38", "58", 
"70", "70", "17", "54", "38")), row.names = 343:352, class = "data.frame")

我在联接上得到以下内容,该联接在应存在的匹配项上返回NA。不知道为什么它不起作用。

df1 %>%  
   left_join(df2, by = c('Date', 'TIME') )


structure(list(Date = c("05/14/2019", "05/14/2019", "05/16/2019", 
"05/17/2019", "05/18/2019", "05/18/2019", "05/20/2019", "05/25/2019", 
"05/26/2019"), TIME = c("10:30 AM", "11:15 AM", "11:00 PM", " 7:36 AM", 
"11:15 AM", " 7:00 PM", " 2:45 PM", " 3:02 AM", "12:40 PM"), 
    Event_ = c("71", "68", "03", NA, "58", NA, NA, NA, "54")), row.names = c(NA, 
-9L), class = "data.frame")

这些问题导致联接有时只能工作吗???在较大的数据帧中,该连接仅起作用1/4的时间,而我对为什么感到困惑。

2 个答案:

答案 0 :(得分:1)

注意:您的列实际上不是日期时间对象,而只是表示日期和时间的字符串,因此空格,标点符号和大写字母对于正确匹配至关重要。

在df1中的某些时候,您会有一些领先的空白。
修剪空白,它应该可以按预期工作。

df1$TIME<-trimws(df1$TIME)
df1 %>%  
  left_join(df2, by = c('Date', 'TIME') )

        Date     TIME Event_
1 05/14/2019 10:30 AM     71
2 05/14/2019 11:15 AM     68
3 05/16/2019 11:00 PM     03
4 05/17/2019  7:36 AM     38
5 05/18/2019 11:15 AM     58
6 05/18/2019  7:00 PM     70
7 05/20/2019  2:45 PM     70
8 05/25/2019  3:02 AM     17
9 05/26/2019 12:40 PM     54

答案 1 :(得分:1)

如Dave2e所述,R当前将日期和时间视为普通字符串。在这种情况下,修剪空白效果非常好。如果要将其格式化为日期时间(POSIXlt)对象,则可以执行以下操作:

# format date-time
df1$datetime <- strptime(paste(df1[,1], df1[,2]), '%m/%d/%Y %I:%M %p')
df2$datetime <- strptime(paste(df2[,1], df2[,2]), '%m/%d/%Y %I:%M %p')

# (Optional) remove old date time columns
df1 <- df1[-c(1:2)]
df2 <- df2[-c(1:2)]

df1 %>% 
  left_join(df2, by = 'datetime')

             datetime Event_
1 2019-05-14 10:30:00     71
2 2019-05-14 11:15:00     68
3 2019-05-16 23:00:00     03
4 2019-05-17 07:36:00     38
5 2019-05-18 11:15:00     58
6 2019-05-18 19:00:00     70
7 2019-05-20 14:45:00     70
8 2019-05-25 03:02:00     17
9 2019-05-26 12:40:00     54

在这种情况下,合并不是必需的,但是对于诸如绘制时间序列之类的其他任务可能很有用。