How to randomly choose only one row in each group

时间:2016-03-14 22:20:11

标签: r random data.table

Say I have a dataframe as follows:

df <- data.frame(Region = c("A","A","A","B","B","C","D","D","D","D"),
                          Combo = c(1,2,3,1,2,1,1,2,3,4))
> df
   Region Combo
1       A     1
2       A     2
3       A     3
4       B     1
5       B     2
6       C     1
7       D     1
8       D     2
9       D     3
10      D     4

What I would like to do, is for each Region (A,B,C,D) randomly choose only one of the possible combos for that region.

If the chosen combination were indicated by a binary variable, it would look something potentially like this:

   Region Combo RandomlyChosen
1       A     1              1
2       A     2              0
3       A     3              0
4       B     1              0
5       B     2              1
6       C     1              1
7       D     1              0
8       D     2              0
9       D     3              1
10      D     4              0

I'm aware of the sample function, but just don't know how to choose only one combo within each region.

I reglarly use data.table, so any solutions using that are welcome. Though solutions not using data.table are equally welcome.

Thanks!

1 个答案:

答案 0 :(得分:1)

在简单R中,您可以在sample()中使用tapply()

df$Chosen <- 0
df[-tapply(-seq_along(df$Region),df$Region, sample, size=1),]$Chosen <- 1
df
   Region Combo Chosen
1       A     1      0
2       A     2      1
3       A     3      0
4       B     1      1
5       B     2      0
6       C     1      1
7       D     1      0
8       D     2      0
9       D     3      1
10      D     4      0

注意-(-selected_row_number)技巧,以避免当一个组有一个行号时从1到n采样