Say I have a dataframe as follows:
df <- data.frame(Region = c("A","A","A","B","B","C","D","D","D","D"),
Combo = c(1,2,3,1,2,1,1,2,3,4))
> df
Region Combo
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 C 1
7 D 1
8 D 2
9 D 3
10 D 4
What I would like to do, is for each Region (A,B,C,D) randomly choose only one of the possible combos for that region.
If the chosen combination were indicated by a binary variable, it would look something potentially like this:
Region Combo RandomlyChosen
1 A 1 1
2 A 2 0
3 A 3 0
4 B 1 0
5 B 2 1
6 C 1 1
7 D 1 0
8 D 2 0
9 D 3 1
10 D 4 0
I'm aware of the sample function, but just don't know how to choose only one combo within each region.
I reglarly use data.table, so any solutions using that are welcome. Though solutions not using data.table are equally welcome.
Thanks!
答案 0 :(得分:1)
在简单R中,您可以在sample()
中使用tapply()
:
df$Chosen <- 0
df[-tapply(-seq_along(df$Region),df$Region, sample, size=1),]$Chosen <- 1
df
Region Combo Chosen
1 A 1 0
2 A 2 1
3 A 3 0
4 B 1 1
5 B 2 0
6 C 1 1
7 D 1 0
8 D 2 0
9 D 3 1
10 D 4 0
注意-(-selected_row_number)
技巧,以避免当一个组有一个行号时从1到n采样