我在这里撞墙。希望有人可以提供帮助。
我在R中有一个聚合数据框(d1),带有时间列和带有二进制值的列。时间列没有统一的时间步。
D1:
Time Set
1: 2015-01-03 14:55:00 0
2: 2015-01-06 14:20:00 1
3: 2015-01-06 14:25:00 1
4: 2015-01-06 14:30:00 1
5: 2015-01-06 14:35:00 1
6: 2015-01-06 14:40:00 1
7: 2015-01-06 14:45:00 0
8: 2015-01-06 16:10:00 1
9: 2015-01-07 07:45:00 0
10: 2015-01-07 08:00:00 1
11: 2015-01-07 08:05:00 1
12: 2015-01-07 08:45:00 0
我还有一个数据框(d2),其列具有统一的时间步长,因此d2中的行数比d1中的行数长
D2:
Time_Ideal
1: 2015-01-09 14:05:00
2: 2015-01-09 14:10:00
3: 2015-01-09 14:15:00
4: 2015-01-09 14:20:00
5: 2015-01-09 14:25:00
6: 2015-01-09 14:30:00
7: 2015-01-09 14:35:00
8: 2015-01-09 14:40:00
9: 2015-01-09 14:45:00
10: 2015-01-09 14:50:00
我想要做的是打印Time_Ideal旁边的Set-value,其中两个时间列中的时间值分别与d1和d2匹配。
我试过
d1 <- data.table(d1, key = 'Time')
d2 <- data.table(d2, key = 'Time_Ideal')
d2[d1, nomatch=0]
d2[d1]
灵感来自this SO post
但我无法使其正常运作..
答案 0 :(得分:3)
这是data.table
解决这个问题的方法(因为这是实际问题)。使用@bergant提供的修改数据(因为OP数据集不匹配),只需执行:
setkey(setDT(d1), Time) # `d2` doesn't have to be a `data.table`
d1[d2] # you can set `, nomatch = 0L` if you want to remove non-matches
# Time Set
# 1: 2015-01-09 15:05:00 NA
# 2: 2015-01-09 15:10:00 NA
# 3: 2015-01-09 15:15:00 NA
# 4: 2015-01-09 15:20:00 1
# 5: 2015-01-09 15:25:00 1
# 6: 2015-01-09 15:30:00 1
# 7: 2015-01-09 15:35:00 1
# 8: 2015-01-09 15:40:00 1
# 9: 2015-01-09 15:45:00 0
# 10: 2015-01-09 15:50:00 NA
另一种方式(更好)是通过引用修改d2
。您必须先将d2
转换为data.table
,然后将key
转换为
setkey(setDT(d2), Time_Ideal)
d2[d1, Set := i.Set][] # `d2` was modified by reference.
# Time Set
# 1: 2015-01-09 15:05:00 NA
# 2: 2015-01-09 15:10:00 NA
# 3: 2015-01-09 15:15:00 NA
# 4: 2015-01-09 15:20:00 1
# 5: 2015-01-09 15:25:00 1
# 6: 2015-01-09 15:30:00 1
# 7: 2015-01-09 15:35:00 1
# 8: 2015-01-09 15:40:00 1
# 9: 2015-01-09 15:45:00 0
# 10: 2015-01-09 15:50:00 NA
答案 1 :(得分:1)
也许用dplyr?
library(dplyr)
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
要填写Set的最后一个值,请使用:
library(dplyr)
library(zoo)
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time")) %>%
mutate(Set = na.locf(d3$Set, na.rm = FALSE))
没有使用日期时间类型的提示。我在下面使用POSIXct:
d1 <-
structure(list(Time = structure(c(1420293300, 1420550400, 1420550700,
1420551000, 1420551300, 1420551600, 1420551900, 1420557000, 1420613100,
1420614000, 1420614300, 1420616700), class = c("POSIXct", "POSIXt"
), tzone = ""),
Set = c(0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L,
1L, 0L)), row.names = c(NA, -12L), .Names = c("Time", "Set"),
class = "data.frame")
d2 <-
structure(list(Time_Ideal = structure(c(1420808700, 1420809000,
1420809300, 1420809600, 1420809900, 1420810200, 1420810500, 1420810800,
1420811100, 1420811400), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, -10L), .Names = "Time_Ideal",
class = "data.frame")
没有日期交集(d1次&lt; d2次),所以我们得到了NA:
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
Time_Ideal Set
1 2015-01-09 14:05:00 NA
2 2015-01-09 14:10:00 NA
3 2015-01-09 14:15:00 NA
4 2015-01-09 14:20:00 NA
5 2015-01-09 14:25:00 NA
6 2015-01-09 14:30:00 NA
7 2015-01-09 14:35:00 NA
8 2015-01-09 14:40:00 NA
9 2015-01-09 14:45:00 NA
10 2015-01-09 14:50:00 NA
将来将d1转移3天:
d1$Time <- d1$Time + 3600*24*3 # three days shift
再次执行
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
Time_Ideal Set
1 2015-01-09 14:05:00 NA
2 2015-01-09 14:10:00 NA
3 2015-01-09 14:15:00 NA
4 2015-01-09 14:20:00 1
5 2015-01-09 14:25:00 1
6 2015-01-09 14:30:00 1
7 2015-01-09 14:35:00 1
8 2015-01-09 14:40:00 1
9 2015-01-09 14:45:00 0
10 2015-01-09 14:50:00 NA
答案 2 :(得分:0)
可能不是最好的解决方案,但我认为它有效:
library(plyr)
d3 <- d2
colnames(d3) <- c("Time")
d4 <- join(d3, d1)
for(i in 2:length(d4$Set)){
if(is.na(d4$Set[i])){
d4$Set[i] <- d4$Set[i - 1]
}
}