Question

我是R的新手并编写了一个需要多次运行才能生成最终数据集的函数。因此，多次由唯一年份的向量确定，并且再次基于这些年份，每次函数给出输出。我仍然没有得到正确的输出。

所需的输出：例如每年需要10个样本，第10次运行后我应该有100行正确的输出。

create_strsample <- function(n1,n2){   
   yr <- c(2010,2011,2012,2013)
   for(i in 1:length(yr)){

     k1<-subset(data,format(as.Date(data$account_opening_date),"%Y")==yr[i])
     r1 <-sample(which(!is.na(k1$account_closing_date)),n1,replace=FALSE)
     r2<-sample(which(is.na(k1$account_closing_date)),n2,replace=FALSE)
     #final.data <-k1[c(r1,r2),]
     sample.data <- lapply(yr, function(x) {f.data<-create_strsample(200,800)})

     k1 <- do.call(rbind,k1)
     return(k1)
   }

   final <- do.call(rbind,sample.data)
   return(final)
}
stratified.sample.data <- create_strsample(200,800)

Answer 1

MWE本来不错，但我会给你一个这类问题的模板。请注意，这并未针对速度（或其他任何因素）进行优化，而只是为了便于理解。

如评论中所述，循环中对create_strsample的调用看起来很奇怪，可能不是你真正想要的。

data <- data.frame()         # we need an empty, but existing variable for the first loop iteration
for (i in 1:10) {    
    temp <- runif(1,max=i)   # do something...
    data <- rbind(data,temp) # ... and add this to 'data'
}                            # repeat 10 times
rm(temp)                     # don't need this anymore

循环中的return(k1)也看起来不对。

Answer 2

我在你的建议@herbaman之后尝试了所需的输出减去lapply。

create_strsample <- function(n1,n2){
final.data <- NULL
yr <- c(2010,2011,2012,2013)

for(i in 1:length(yr)){

    k1<-subset(data,format(as.Date(data$account_opening_date),"%Y")==yr[i])
    r1 <- k1[sample(which(!is.na(k1$account_closing_date)),n1,replace=FALSE), ]
    r2 <- k1[sample(which(is.na(k1$account_closing_date)),n2,replace=FALSE), ]
    sample.data <- rbind(r1,r2)
    final.data <- rbind(final.data, sample.data)
}

return(final.data)
}


 stratified.sample.data <- create_strsample(200,800)

函数要多次运行以生成R

2 个答案: