我正在尝试用“预测”中的预测值替换“test”中的NA。我正在尝试使用匹配,但我无法弄明白。请记住id和time创建一个由两部分组成的唯一ID。有什么建议? (请记住,我的数据集远大于此示例(行= 32000))
test = data.frame(id =c(1,1,1,2,2,2), time=c(89,99,109,89,99,109), data=c(3,4,NA,5,2,NA))
forecast = data.frame(id =c(1,2), time=c(109,109), data=c(5,1))
期望的输出
out = data.frame(id =c(1,1,1,2,2,2), time=c(89,99,109,89,99,109), data=c(3,4,5,5,2,1))
答案 0 :(得分:2)
以下是data.table
解决方案
test_dt <- data.table(test, key = c('id', 'time'))
forecast_dt <- data.table(test, key = c('id', 'time'))
forecast[test][,data := ifelse(is.na(data), data.1, data)]
EDIT。基准测试:即使对于小型数据集,数据表也快3倍。
库(rbenchmark)
f_merge <- function(){
out2 <- merge(test, forecast, by = c("id", "time"), all.x = TRUE)
out2 <- transform(out2,
newdata = ifelse(is.na(data.x), data.y, data.x), data.x = NULL, data.y = NULL)
return(out2)
}
f_dtable <- function(){
test <- data.table(test, key = c('id', 'time'))
forecast <- data.table(forecast, key = c('id', 'time'))
test <- forecast[test][,data := ifelse(is.na(data), data.1, data)]
test$data.1 <- NULL
return(test)
}
benchmark(f_merge(), f_dtable(), order = 'relative',
columns = c('test', 'elapsed', 'relative'))
test elapsed relative
2 f_dtable() 0.86 1.00
1 f_merge() 2.26 2.63
答案 1 :(得分:1)
我会使用merge
将数据加入到一起,然后分两步计算新列:
out2 <- merge(test, forecast, by = c("id", "time"), all.x = TRUE)
> out2
id time data.x data.y
1 1 89 3 NA
2 1 99 4 NA
3 1 109 NA 5
4 2 89 5 NA
5 2 99 2 NA
6 2 109 NA 1
#Compute new variable and clean up old ones:
out2 <- transform(out2, newdata = ifelse(is.na(data.x), data.y, data.x), data.x = NULL, data.y = NULL)
> out2
id time newdata
1 1 89 3
2 1 99 4
3 1 109 5
4 2 89 5
5 2 99 2
6 2 109 1
答案 2 :(得分:0)
试试这个:
test$data[is.na(test$data)] <- forecast[((forecast$id %in% test$id) & (forecast$time %in% test$time)),]$data