是否可以使用data.table语法(如X [Y])等效合并(...,all = TRUE)?
具体来说,我需要一种非常快速的方法来获得结果:
item_length = data.table(index = 1:10, length = c(2,5,4,6,3),key ="index")
item_weigth = data.table(index = c(2,4,6,7,8,11), weight= c(.3,.5,.2), key = "index")
merge(x2,y2, all=TRUE)
这是:
> merge(item_length ,item_weigth , all=TRUE)
index length weight
[1,] 1 2 NA
[2,] 2 5 0.3
[3,] 3 4 NA
[4,] 4 6 0.5
[5,] 5 3 NA
[6,] 6 2 0.2
[7,] 7 5 0.3
[8,] 8 4 0.5
[9,] 9 6 NA
[10,] 10 3 NA
[11,] 11 NA 0.2
答案 0 :(得分:13)
很抱歉回答我自己的问题,但我认为值得分享:
一个非常快速的解决方案似乎是更新到最新版本的data.table(1.8.0)。 (非常感谢,马修!)
以下是我的测试数据和基准测试结果:
使用data.table:
full_index <- 1:5000000
ratio_in_samples <- 0.8
x <- data.table(index = sample(full_index, length(full_index)*ratio_in_samples),
var1 = rnorm(length(full_index)*ratio_in_samples),
key = "index")
y <- data.table(index = sample(full_index, length(full_index)*ratio_in_samples),
var2 = rnorm(length(full_index)*ratio_in_samples),
key = "index")
system.time(
result <- merge(x,y, all=TRUE)
)
data.table的时间:
user system elapsed
5.05 0.55 5.62
而使用data.frame:
full_index <- 1:5000000
ratio_in_samples <- 0.8
x <- data.frame(index = sample(full_index, length(full_index)*ratio_in_samples),
var1 = rnorm(length(full_index)*ratio_in_samples))
y <- data.frame(index = sample(full_index, length(full_index)*ratio_in_samples),
var2 = rnorm(length(full_index)*ratio_in_samples))
system.time(
result <- merge(x,y, all=TRUE)
)
data.frame的时间:
user system elapsed
78.83 1.75 80.67