所以我有一个像这样的数据框:
df1 <- structure(list(Date = c("05/14/2019", "05/14/2019", "05/16/2019",
"05/17/2019", "05/18/2019", "05/18/2019", "05/20/2019", "05/25/2019",
"05/26/2019"), TIME = c("10:30 AM", "11:15 AM", "11:00 PM", " 7:36 AM",
"11:15 AM", " 7:00 PM", " 2:45 PM", " 3:02 AM", "12:40 PM")), row.names = 355:363, class = "data.frame")
当然是子集,但我想从这里加入另一个df的信息:
df2 <- structure(list(Date = c("05/14/2019", "05/14/2019", "05/16/2019",
"05/17/2019", "05/18/2019", "05/18/2019", "05/20/2019", "05/25/2019",
"05/26/2019", "05/31/2019"), TIME = c("10:30 AM", "11:15 AM",
"11:00 PM", "7:36 AM", "11:15 AM", "7:00 PM", "2:45 PM", "3:02 AM",
"12:40 PM", "2:10 PM"), Event_ = c("71", "68", "03", "38", "58",
"70", "70", "17", "54", "38")), row.names = 343:352, class = "data.frame")
我在联接上得到以下内容,该联接在应存在的匹配项上返回NA。不知道为什么它不起作用。
df1 %>%
left_join(df2, by = c('Date', 'TIME') )
structure(list(Date = c("05/14/2019", "05/14/2019", "05/16/2019",
"05/17/2019", "05/18/2019", "05/18/2019", "05/20/2019", "05/25/2019",
"05/26/2019"), TIME = c("10:30 AM", "11:15 AM", "11:00 PM", " 7:36 AM",
"11:15 AM", " 7:00 PM", " 2:45 PM", " 3:02 AM", "12:40 PM"),
Event_ = c("71", "68", "03", NA, "58", NA, NA, NA, "54")), row.names = c(NA,
-9L), class = "data.frame")
这些问题导致联接有时只能工作吗???在较大的数据帧中,该连接仅起作用1/4的时间,而我对为什么感到困惑。
答案 0 :(得分:1)
注意:您的列实际上不是日期时间对象,而只是表示日期和时间的字符串,因此空格,标点符号和大写字母对于正确匹配至关重要。
在df1中的某些时候,您会有一些领先的空白。
修剪空白,它应该可以按预期工作。
df1$TIME<-trimws(df1$TIME)
df1 %>%
left_join(df2, by = c('Date', 'TIME') )
Date TIME Event_
1 05/14/2019 10:30 AM 71
2 05/14/2019 11:15 AM 68
3 05/16/2019 11:00 PM 03
4 05/17/2019 7:36 AM 38
5 05/18/2019 11:15 AM 58
6 05/18/2019 7:00 PM 70
7 05/20/2019 2:45 PM 70
8 05/25/2019 3:02 AM 17
9 05/26/2019 12:40 PM 54
答案 1 :(得分:1)
如Dave2e所述,R当前将日期和时间视为普通字符串。在这种情况下,修剪空白效果非常好。如果要将其格式化为日期时间(POSIXlt)对象,则可以执行以下操作:
# format date-time
df1$datetime <- strptime(paste(df1[,1], df1[,2]), '%m/%d/%Y %I:%M %p')
df2$datetime <- strptime(paste(df2[,1], df2[,2]), '%m/%d/%Y %I:%M %p')
# (Optional) remove old date time columns
df1 <- df1[-c(1:2)]
df2 <- df2[-c(1:2)]
df1 %>%
left_join(df2, by = 'datetime')
datetime Event_
1 2019-05-14 10:30:00 71
2 2019-05-14 11:15:00 68
3 2019-05-16 23:00:00 03
4 2019-05-17 07:36:00 38
5 2019-05-18 11:15:00 58
6 2019-05-18 19:00:00 70
7 2019-05-20 14:45:00 70
8 2019-05-25 03:02:00 17
9 2019-05-26 12:40:00 54
在这种情况下,合并不是必需的,但是对于诸如绘制时间序列之类的其他任务可能很有用。