Question

我正在创作一个R包，并且有几个数字向量，用户经常将其用作各种包函数的参数。将这些向量存储在包中以便用户可以轻松访问它们的最佳方法是什么？

我的一个想法是将每个向量保存为 inst / data 中的数据文件。然后，用户可以在需要时使用数据文件的名称代替向量（至少，我可以在开发期间执行此操作）。我喜欢这个想法，但我不确定这个解决方案是否会违反CRAN规则/规范或导致任何问题。

# To create one such vector as a data file
octants <- c(90, 135, 180, 225, 270, 315, 360, 45)
devtools::use_data(octants)
# To access this vector in usage
my_function(data, octants)

我的另一个想法是创建一个返回所需向量的单独函数。然后用户可以在需要时调用适当的函数。由于某种原因，这可能比数据更好，但我担心用户忘记函数名后面的()。

# To create the vector within a function
octants <- function() c(90, 135, 180, 225, 270, 315, 360, 45}
# To access this vector in usage
my_function(data, octants()) # works
my_function(data, octants) # doesn't work

是否有人对哪种解决方案更可取或更好的替代方案有所了解？

Answer 1

我很荣幸，我花了很长时间仔细阅读手册，问自己同样的问题。 这样做，它是一个好主意，它很有用，还有一些工具可以帮助你。 Writing help extension手册描述了您可以保存数据的格式，以及如何遵循R标准。

我建议在包中提供数据的建议是使用：

devtools::use_data(...,internal=FALSE,overwrite=TRUE)

其中...是您要保存的数据集的未加引号的名称。

https://www.rdocumentation.org/packages/devtools/versions/1.13.3/topics/use_data

您只需在程序包的inst子目录中创建一个文件即可创建数据集。我自己的例子是https://github.com/cran/stacomiR/blob/master/inst/config/generate_data.R

例如，我用它来创建r_mig数据集

#################################
# generates dataset for report_mig
# from the vertical slot fishway located at the estuary of the Vilaine (Brittany)
# Taxa Liza Ramada (Thinlip grey mullet) in 2015
##################################

#{ here some stuff necessary to generate this dataset from my package
# and database}
setwd("C:/workspace/stacomir/pkg/stacomir")
devtools::use_data(r_mig,internal=FALSE,overwrite=TRUE)

这将以适当的格式保存您的数据集。使用internal = FALSE可以使用data()访问所有用户。我建议您阅读data()帮助文件。您可以使用data()访问您的文件，包括当您不在包中但前提是它们位于数据子目录中时。

如果lib.loc和package都是NULL（默认值），则数据集为在所有当前加载的包中搜索然后在'data'中搜索当前工作目录的目录（如果有）。

如果您正在使用Roxygen，请创建一个名为data.R的R文件，您可以在其中存储所有数据集的描述。下面是stacomiR包中某个数据集的Roxygen命名示例。

#' Video counting of thin lipped mullet (Liza ramada) in 2015 in the Vilaine (France)
#' 
#' This dataset corresponds to the data collected at the vertical slot fishway
#' in 2015, video recording of the thin lipped mullet Liza ramada migration
#'
#' @format An object of class report_mig with 8 slots:
#' \describe{
#'   \item{dc}{the \code{ref_dc} object with 4 slots filled with data corresponding to the iav postgres schema}
#'   \item{taxa}{the \code{ref_taxa} the taxa selected}
#'   \item{stage}{the \code{ref_stage} the stage selected}
#'   \item{timestep}{the \code{ref_timestep_daily} calculated for all 2015}
#'   \item{data}{ A dataframe with 10304 rows and 11 variables
#'          \describe{
#'              \item{ope_identifiant}{operation id}
#'              \item{lot_identifiant}{sample id}
#'              \item{lot_identifiant}{sample id}
#'              \item{ope_dic_identifiant}{dc id}
#'              \item{lot_tax_code}{species id}
#'              \item{lot_std_code}{stage id}
#'              \item{value}{the value}
#'              \item{type_de_quantite}{either effectif (number) or poids (weights)}
#'              \item{lot_dev_code}{destination of the fishes}
#'              \item{lot_methode_obtention}{method of data collection, measured, calculated...} 
#'              }
#'   }
#'   \item{coef_conversion}{A data frame with 0 observations : no quantity are reported for video recording of mullets, only numbers}
#'   \item{time.sequence}{A time sequence generated for the report, used internally}
#' }
#' @keywords data
"r_mig"

完整档案在那里：

https://github.com/cran/stacomiR/blob/master/R/data.R

另一个例子：阅读：http://r-pkgs.had.co.nz/data.html#documenting-data

然后，您可以通过调用data("r_mig")

在以下测试中使用这些数据

test_that("Summary method works",
    {
     ... #some other code

      data("r_mig")
      r_mig<-calcule(r_mig,silent=TRUE)
      summary(r_mig,silent=TRUE)
      rm(list=ls(envir=envir_stacomi),envir=envir_stacomi)
    })

最重要的是，您可以使用手册中的内容来描述如何在包中使用功能。

如何在R包中存储常用数据或参数？

1 个答案: