Question

我正在尝试创建一些模拟数据。为了创建聚类数据，我已经指定处方者是在一个还是多个本地健康区域（LHA）工作。现在，我正在尝试根据他们的LHA为患者分配处方药。其代码在以下代码块中。

for (i in seq_along(data$LHA)) {
  data$prescriber_id[i] <- sample(x = number_of_LHAs_worked$prescriber_id[
    number_of_LHAs_worked$assigned_LHAs_2 == data$LHA[i]], 
                                  size = 1)
}

此循环适用于多个LHA中的处方者（即，给予样本函数的x的长度大于1.但是，当处方者由于样本函数的行为而仅在一个LHA中工作时，它会失败。

sample(x = 154, size = 1)

当只给出一个x的数字时，R会创建一个从1到x的索引，然后随机选择此范围内的数字。

虽然我已经为我的目的制定了解决方案;我很想知道其他人是否已经找到了使样本功能更加一致的方法。具体来说，强制示例函数仅使用指定的集合。

sample(x = 154:155, size = 1)    # here the function chooses only a number in the set {154, 155}.

Answer 1

?sample在其示例中提供了答案：

set.seed(47)

resample <- function(x, ...) x[sample.int(length(x), ...)]

# infers 100 means 1:100
sample(100, 1)
#> [1] 98

# stricter
resample(100, 1)
#> [1] 100

# still works normally if explicit
resample(1:100, 1)
#> [1] 77

只有1个数字

1 个答案: