Question

我有一个数据集（N为2794），我想要提取一个子集，随机重新分配该类并将其放回数据帧。

实施例

| Index | B | C | Class|
| 1     | 3 | 4 | Dog  |
| 2     | 1 | 9 | Cat  |
| 3     | 9 | 1 | Dog  |
| 4     | 1 | 1 | Cat  |

从上面的例子中，我想从列＆＃39; Class＆＃39;中随机抽取N个观察值。并将它们混合在一起，这样你就可以得到这样的东西..

| Index | B | C | Class|
| 1     | 3 | 4 | Cat  | Re-sampled 
| 2     | 1 | 9 | Dog  | Re-sampled 
| 3     | 9 | 1 | Dog  |
| 4     | 1 | 1 | Dog  | Re-sampled

此代码随机提取行并重新对其进行采样，但我不想提取行。我想将它们保留在数据框中。

 sample(Class[sample(nrow(Class),N),])

Answer 1

假设df是您的数据框：

df <- data.frame(index=1:4, B=c(3,1,9,1), C=c(4,9,1,1), Class=c("Dog", "Cat", "Dog", "Cat"))

这会做你想要的吗？

dfSamp <- sample(1:nrow(df), N)
df$Class[dfSamp] <- sample(df$Class[dfSamp])

Answer 2

我模拟了数据框并做了一个例子：

df <- data.frame(
  ID=1:4,
  Class=c('Dog', 'Cat', 'Dog', 'Cat')
)

N <- 2
sample_ids <- sample(nrow(df), N)

df$Class[sample_ids] <- sample(df$Class, length(sample_ids))

Answer 3

假设format是您为数据名称命名的方式，您可以这样做：

Class

对原始数据帧进行100次观察，并将它们堆叠到底部。我还添加了一列，以便您知道观察是从一开始就是采样还是存在于数据框中。

Answer 4

您想要做的是在线替换某些课程，而不是其他课程。

因此，如果我们从数据框开始，df

set.seed(100)
df = data.frame(index = 1:100,
                B = sample(1:10,100,replace = T),
                C = sample(1:10,100,replace = T),
                Class = sample(c('Cat','Dog','Bunny'),100,replace = T))

并且您想要更新5个随机行，然后我们需要选择要更新的行以及要在这些行中放置的新类。通过引用unique(df$class)，您不会按当前事件对类进行加权。您可以使用weight参数进行调整，或删除unique以将匹配项用作权重。

n_rows = 5
rows_to_update = sample(1:100,n_rows,replace = F)
new_classes = sample(unique(df$Class),n_rows,replace = T)
rows_to_update
#> [1] 85 65 94 60 48
new_classes
#> [1] "Bunny" "Dog"   "Dog"   "Dog"   "Bunny"

我们可以检查原始数据的样子

df[rows_to_update,]
#>    index B  C Class
#> 85    85 1  2   Dog
#> 65    65 5  1 Bunny
#> 94    94 5 10   Dog
#> 60    60 3  7 Bunny
#> 48    48 9  1   Cat

我们可以通过引用列和要更新的行来更新它。

df$Class[rows_to_update] = new_classes
df[rows_to_update,]
#>    index B  C Class
#> 85    85 1  2 Bunny
#> 65    65 5  1   Dog
#> 94    94 5 10   Dog
#> 60    60 3  7   Dog
#> 48    48 9  1 Bunny

随机采样仅R中的数据子集

4 个答案: