split-apply-combine ala plyr的data.table语法

时间:2015-08-04 20:58:24

标签: r data.table plyr

我刚开始学习data.table并开始研究小插曲 - 尽管我在项目中同时使用它。如何将某些plyr语法替换为data.table

input <- data.table(ID = c(37, 45, 900), a1 = c(1, 2, 3), a2 = c(43, 320,390), 
                      b1 = c(-0.94, 2.2, -1.223), b2 = c(2.32, 4.54, 7.21), c1 = c(1, 2, 3), 
                      c2 = c(-0.94, 2.2, -1.223))

# simple user defined function that conveys my problem
 func <- function(x, num) {
  x <- data.table(x)
  new_b <- x$b1[1]
  x2 <- within(x[1,], {
    b1 = new_b
    b2 = 51
  })
  imp <- rbindlist(replicate(num, x2, simplify= FALSE))
  return(rbindlist(list(x, imp)))
}

# wrapper function
wrap_func <- function(dat, num= 5, plyr= FALSE) {
if (plyr == TRUE) {
    return(plyr::ddply(dat, .var= "ID", .fun= func, num= num))
  } else {
    return(dat[, lapply(.SD, FUN= func, num), by= ID])
  }
}

plyr正常工作

wrap_func(dat=input, 5, plyr=TRUE)

data.table语法是什么?

wrap_func(dat=input, num=5, plyr=FALSE) # gives error

提前致谢!!

更新

基于@ Frank在评论中的建议,我根据我的真实数据/代码对此进行了基准测试。在此,impute_zero_resp_all与示例中的wrap_func实际等效。

我从一个拥有~50k行和1800组的数据集开始;插补由组完成,产生一个约170k行和相同1800组的数据集:

vec1 <- vec2 <- vector(mode= "numeric", length= 50)
for (i in 1:50) {
  vec1[i] <- system.time(impute_zero_resp_all(dat= test_dat2))[3] #DT
  vec2[i] <- system.time(impute_zero_resp_all2(dat= test_dat2))[3] #PLYR 
}

summary(vec1); summary(vec2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  22.62   22.76   22.81   22.84   22.84   23.72 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  27.19   27.35   27.40   27.49   27.45   30.07

quantile(vec1, seq(0,1,.1))
    0%    10%    20%    30%    40%    50%    60%    70%    80%    90%   100% 
22.620 22.670 22.728 22.760 22.786 22.810 22.824 22.840 22.870 22.917 23.720 
quantile(vec2, seq(0,1,.1))
    0%    10%    20%    30%    40%    50%    60%    70%    80%    90%   100% 
27.190 27.289 27.330 27.357 27.376 27.400 27.424 27.440 27.476 27.522 30.070

sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

0 个答案:

没有答案