Question

我正在尝试编写一个接受数据帧的函数，然后在for（）循环中生成子集数据帧。作为第一步，我尝试了以下内容：

dfcreator<-function(X,Z){
  for(i in 1:Z){
  df<-subset(X,Stratum==Z)    #build dataframe from observations where index=value
  assign(paste0("pop", Z),df) #name dataframe
 }
}

然而，这并没有将任何内容保存到内存中，当我尝试指定一个return（）时，我仍然没有得到我需要的东西。作为参考，我正在使用瑞典数据集（原产于RStudio）。

编辑Per Melissa的建议！

我尝试实现以下代码：

sampler <- function(df, n,...) {
  return(df[sample(nrow(df),n),])
}

sample_list<-map2(data_list, stratumSizeVec, sampler)

其中stratumSizeVec是1X7 df，data_list是七个dfs的列表。当我这样做时，我在样本列表中得到七个样本，所有样本都等于stratumSizeVec [1]。为什么map2没有按以下方式输入

sampler(data_list$pop0,stratumSizeVec[1])
sampler(data_list$pop1,stratumSizeVec[2])

...

sampler(data_list$pop6,stratumSizeVec[7])

此外，有没有办法在lapply中“嵌套”map2函数？

Answer 1

我很困惑为什么你从来没有在循环中的任何地方实际使用Z。看起来您正在创建数据集的Stratum == Z个副本data_list <- split(df, df$Stratum) names(data_list) <- paste0("pop", sort(unique(df$Stratum))) - 您所追求的是什么？

至于你的代码，我会使用以下内容：

split

这没有定义函数，我们调用base-R函数（即df$Stratum），它基于某个向量分割数据帧（这里我们使用Stratum）。结果是一个数据框列表，每个数据框的单个值为sampled_data <- lapply(data_list, function(df, n,...) { # n is the number of rows to take, the dots let you send other information to the `sample` function. df[sample(nrow(df), n, ...),] }, n = 5, replace = FALSE # this is default, but the purpose of using the ... notation is to allow this (and any other options in the `sample` function) to be changed. )。

从行

随机抽样

sampler <- function(df, n,...) {
  df[sample(nrow(df), n, ...),]
}
sampled_data <- lapply(data_list, sampler, n = 10) # replace 10 with however many samples you want.

您也可以单独定义该功能：

sampler

purrr：map2方法

根据定义，library(purrr) map2(data_list, sampleSizeVec, sampler, replace = FALSE) # replace = FALSE not needed, there as an example only.函数不需要修改，第一个列表（data_list）的每个元素都放入sampler的第一个参数，第二个“list”的相应元素（sampleSizeVec））被置于第二个论点。

{{1}}

用于在函数内循环以生成数据帧的子集

1 个答案:

从行

purrr：map2方法