Question

我创建了一个矢量，如下所示

Expenditure
 [1] 13.9 15.4 15.8 17.9 18.3 19.9 20.6 21.4 21.7 23.1
[11] 20.0 20.6 24.0 25.1 26.2 30.0 30.6 30.9 33.8 44.1

现在我从Expenditure

中挑选了10个随机样本

ransomsample <- sample(Expenditure,10)
ransomsample
 [1] 19.9 21.4 20.0 30.0 17.9 25.1
 [7] 26.2 21.7 33.8 13.9

现在，我想在创建名为Expenditure的示例后找到ransomsample中的其余项目。我可以使用的任何现有功能吗？

Answer 1

这应该做：

#generate 20 random numbers
x <- rnorm(20)
#sample 10 of them
randomSample <- sample(x, 10, replace = FALSE)

#we can get the ones we sampled with:
x[x %in% randomSample]

#Let's confirm this. NOTE - added sort() to easily see they do match
cbind(sort(randomSample), sort(x[x %in% randomSample]))

#So we want to negate the above
x[!(x %in% randomSample)]

Answer 2

接近此方法取决于您需要如何处理从中采样的向量中的重复。如果你可以肯定没有重复，那么@Chase使用x[!(x %in% randomSample)]给出的简单方法是完美的。但，如果有可能重复，则需要更多关注。我们可以在下面清楚地看到这一点：

# Start with a vector (length=9) replete with replicates
x <- rep(letters[1:3],3)

# Now sample 8 of its 9 values (leaving one unsampled)
set.seed(123)
randomSample <- sample(x, 8, replace = FALSE)

# try using simple method to find which value remains after sampling
x[!(x %in% randomSample)]
## character(0)

这种简单方法失败，因为%in%匹配x中所有次采样值的出现。如果这是你想要的，那么这就是你的方法。但是，如果您想知道采样后剩余的每个值有多少，那么我们需要采取另一条线。

有几种方法，但最优雅的方法是从初始向量的频率表中减去样本的频率表，以提供剩余未采样值的表格。然后从该表中生成非抽样值的向量。

xtab <- as.data.frame(table(x))
stab <- as.data.frame(table(randomSample))
xtab[which(xtab$x %in% stab$randomSample),]$Freq <- 
  xtab[which(xtab$x %in% stab$randomSample),]$Freq - stab$Freq
rep(xtab$x, xtab$Freq)
## [1] a

取样后如何在矢量中找到剩余的项目？

2 个答案: