R函数仅在通过我的包调用时失败

时间:2015-09-03 13:59:47

标签: r parallel-processing data.table roxygen2

我正在创建一个工作包(称为mypackage),并且有一个我想要包含在其中的函数。当我这样称呼它时,这个功能似乎完全正常:

myFunction()

但是,当我这样称呼它时失败了:

mypackage::functionName()

我正在使用roxygen2来构建我的包。该函数如下所示:

volumeDiffBoot.test <- function(screenedData, B=100, recSetting=c(80, 6), 
                                curSetting=c(80, 6), numCore=3){
  inputStrings <- unique(screenedData$ID)
  cl <- makeCluster(numCore)
  # must pass all relevant variables to the worker nodes: 
  clusterExport(cl, list=c("inputStrings", "B", "screenedData", "curSetting", "recSetting"), 
                envir=environment())
  clusterEvalQ(cl, library(data.table))
  # change the data.frame to a data.table (MUCH faster this way)
  b <- as.data.table(screenedData)
  setkey(b, ID) # set the key for faster subsetting
  # bootstrap sampling of volume differences:
  bootSamples <- parLapply(cl, as.matrix(1:B), function(i){
    bootSample1 <- sample(inputStrings, replace=TRUE)
    bootSample2 <- sample(inputStrings, replace=TRUE)
    numHits <- lapply(1:length(bootSample1), function(j){
      # subsets the data by ID first using data.table key (much faster this way): 
      d1 <- b[list(bootSample1[j])]
      # return the number of rows meeting the accuracy and variation conditions: 
      curHits <- d1[accuracy >= curSetting[1] & numVariation <= curSetting[2], .N]

      d2 <- b[list(bootSample2[j])]
      recHits <- d2[accuracy >= recSetting[1] & numVariation <= recSetting[2], .N]
      return(c(curHits, recHits))
    })
    q <- do.call(rbind, numHits)
    return(sum(q[,1]) - sum(q[,2]))
  })
  stopCluster(cl) # close the cluster
  bootSamples <- unlist(bootSamples)
  cat("If the following confidence interval contains zero, the difference in volume is not significant.\n")
  print(quantile(bootSamples, c(0.025, 0.975)))
  return(bootSamples)
}

以下是一些要使用的数据:

myDat <- structure(list(accuracy = c(0L, 0L, 100L, 100L, 100L, 100L, 100L, 
100L, 85L, 73L, 0L, 0L, 90L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 100L, 
100L, 100L, 94L, 100L), ID = c(1016L, 1017L, 1019L, 1014L, 1016L, 
1010L, 1003L, 1005L, 1008L, 1015L, 1016L, 1008L, 1006L, 1012L, 
1001L, 1004L, 1011L, 1009L, 1010L, 1007L, 1008L, 1006L, 1002L, 
1014L, 1019L), numVariation = c(15, 11, 0, 0, 0, 0, 0, 0, 2, 
4, 14, 10, 1, 8, 9, 9, 15, 15, 14, 11, 0, 0, 0, 1, 0)), .Names = c("accuracy", 
"ID", "numVariation"), row.names = c(NA, 25L), class = "data.frame")

这是我的sessionInfo():

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] myPackage_0.1 ggplot2_1.0.1     data.table_1.9.4  dplyr_0.4.2       stringr_1.0.0     doSNOW_1.0.12    
 [7] snow_0.3-13       iterators_1.0.7   foreach_1.4.2     digest_0.6.8     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0      magrittr_1.5     MASS_7.3-43      munsell_0.4.2    colorspace_1.2-6 R6_2.1.1         plyr_1.8.3      
 [8] tools_3.2.1      grid_3.2.1       gtable_0.1.2     DBI_0.3.1        assertthat_0.1   reshape2_1.4.1   codetools_0.2-11
[15] stringi_0.5-5    scales_0.3.0     chron_2.3-47     proto_0.3-10    

通过包调用时抛出的错误是:

> d <- mypackage::volumeDiffBoot.test(myDat, B=3, recSetting = c(88, 2), curSetting = c(80, 6))
Error in checkForRemoteErrors(val) : 
  3 nodes produced errors; first error: invalid subscript type 'list'

1 个答案:

答案 0 :(得分:4)

继续我的评论,现在您添加了错误消息:

import(data.table)行的NAMESPACE文件中添加<exportPattern("^[^\\.]")>,在list行下方重建您的包等。

我有类似的问题。您会注意到错误是在讨论data.table,当您使用b[list(bootSample1[j])]语法data.table时会看到该错误 - 即从您的包调用函数时语法错误(我确定这里的术语是错误的,但这只是一个我不能深刻理解这个问题的标志。)

通过命名空间导入data.table解决了这个问题。

更具体地说,我通过{{1}} FAQ得到了这个答案。

  

<强> 6.9   我创建了一个依赖于data.table的包。我如何能   确保我的包是data.table-aware,以便从中继承   data.frame有效吗?

     

无论哪种   i)在DESCRIPTION文件的Depends:字段中包含data.table,或ii)包括   您的描述文件的导入:字段中的data.table和导入(data.table)   NAMESPACE文件。