如何从数据库中采样记录而不重复?

时间:2019-11-03 13:51:38

标签: r

下午好,我的问题如下:

我有一个名为friends的数据库:

friends <- data_frame(
  name = c("Nicolas", "Thierry", "Bernard", "Jerome", "peter", "yassine", "karim"),
  age = c(27, 26, 30, 31, 31, 38, 39),
  height = c(180, 178, 190, 185, 187, 160, 158),
  married = c("M", "M", "N", "N", "N", "M", "M")
)

i <- Intervals(
  matrix(
    c(0,5000,  
      0,5000,
      7000,10000,  
      7000,10000,
      7000,10000,
      10000,15000,  
      10000,15000
    ),
    byrow = TRUE,
    ncol = 2
  ),
  closed = c( TRUE, TRUE ),
  type = "R"
) 

我需要创建一个以该数据库为参数的函数。

该函数将对一行进行采样(例如,仅对第四行进行一次采样,该函数将不选择该行进行其他执行),然后它将执行某些特性。

sampling_fct<-function(data){

data[sample(nrow(data), 1), ]

# sample a given row only one time  

}

如果我们有5行,则选择应类似于:

数据[3]

数据[2]

数据[5]

数据[4]

数据[1]

其中数据=朋友。

我不应该有重复的结果like these

我希望我的问题很清楚。

谢谢你!

2 个答案:

答案 0 :(得分:1)

我想您正在寻找这样的东西:

#Input data
friends <- data.frame(
  name = c("Nicolas", "Thierry", "Bernard", "Jerome", "peter", "yassine", "karim"),
  age = c(27, 26, 30, 31, 31, 38, 39),
  height = c(180, 178, 190, 185, 187, 160, 158),
  married = c("M", "M", "N", "N", "N", "M", "M")
)

#Random row draw function
#Takes the dataframe and a list of forbidden row values as input
tst_func <- function(data, verbot_list){
  if(length(verbot_list) == nrow(data)){
    stop("ERROR: no possible rows left to be sampled.")
  } else {
    repeat{
      curnum <- as.integer(sample(1:nrow(data), 1))
      if(!(curnum %in% verbot_list)){
        break
      }
    }
    verbot_list <- c(verbot_list, curnum)
    #data[curnum, ]
    return(list(data[curnum, ], verbot_list))
  }
}

#Initialization of empty list in parent env. that maintains rows that cannot be drawn from anymore
rm_list <- c()

#Example run
tstval <- tst_func(friends, rm_list)

tstrow <- tstval[[1]]
tstrow
#      name age height married
# 1 Nicolas  27    180       M

rm_list <- tstval[[2]]
rm_list
# [1] 1

如果(随机)绘制了所有可能的行:

rm_list
# [1] 1 5 3 4 6 2 7

该函数退出并出现错误:

tstval <- tst_func(friends, rm_list)
# Error in tst_func(friends, rm_list) : 
#   ERROR: no possible rows left to be sampled.

(要重复绘制随机行,只需在循环内实现该功能。)

答案 1 :(得分:0)

要确保仅对给定行进行一次采样,可以使用sample(replace=FALSE)(re:R Examples of sample())。

给出数据集,请考虑使用:

friends <- data.frame(
      name = c("Nicolas", "Thierry", "Bernard", "Jerome", "peter", "yassine", "karim"),
      age = c(27, 26, 30, 31, 31, 38, 39),
      height = c(180, 178, 190, 185, 187, 160, 158),
      married = c("M", "M", "N", "N", "N", "M", "M")
    )

sampling_fct<-function(data){

  data[sample(nrow(data), size = 6, replace = TRUE), ]

}

mylist <- list(friends, friends, friends)

mylist_sampled <- lapply(mylist,sampling_fct)