Question

说我有一个带有重复元素的简单向量：

a <- c(1,1,1,2,2,3,3,3)

是否可以从每个重复元素中随机选择一个唯一元素？即一个随机抽奖指出要保留哪些元素：

1,4,6 ## here I selected the first 1, the first 2 and the first 3

另一个：

1,5,8 ## here I selected the first 1, the second 2  and the third 3

我可以对每个重复的元素进行循环，但是我确定必须有一种更快的方法？

编辑：

理想情况下，解决方案还应该始终选择特定元素（如果已经是唯一元素）。即我的载体也可能是：

b <- c(1,1,1,2,2,3,3,3,4) ## The number four is unique and should always be drawn

Answer 1

使用基数R ave，我们可以做类似的事情

unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 5 6

unique(ave(seq_along(a), a, FUN = function(x) if(length(x) > 1) head(sample(x), 1) else x))
#[1] 3 4 7

这将为a分组的每个a值生成一个索引，然后在每个组中选择一个随机索引值。

对sapply和split使用相同的逻辑

sapply(split(seq_along(a), a), function(x) if(length(x) > 1) head(sample(x), 1) else x)

它也可以与tapply

一起使用

tapply(seq_along(a), a, function(x) if(length(x) > 1) head(sample(x), 1) else x)

我们需要检查length（if(length(x) > 1)）的原因是因为来自?sample

如果x的长度为1，则为数字（在is.numeric的意义上），并且x> = 1，则通过样本从1：x进行采样。

因此，当n中只有一个数字（sample()）时，它需要从sample（而不是1:n）中提取n，因此我们需要检查它的长度。

从向量中选择随机和唯一元素

1 个答案: