有这样的数据框:
var
我需要将20%行的1122.46
4599.99
2000.56
5249.99
值随机分配为A,30%行为B,50%行为C.
有没有一种有效的方法来解决这个问题?
答案 0 :(得分:0)
假设您有名为df的数据框: 然后你可以写:
randvar = sample(c('A','B','C'),size = nrow(df),prob = c(0.2,0.3,0.5),replace = TRUE)
df$var = randvar
假设您希望“A”的正确率为20%,那么30%的“B”和50%的“C” 那么它不是一行代码,假设你的c(0.2,0.3,0.5)* df_size都是整数我的答案是:
n = nrow(df)
df$var = "C" #initialize all value to be "C"
index = 1:n
indexa = sample(index,0.2*n) #pick 20% index for "A"
indexb = sample(index[-indexa],0.3*n) #pick 30% index for "B" need to rule out the "A"s you already picked
df$var[indexa] = "A" #assign "A" to df$var at indexa
df$var[indexb] = "B" #assign "B" to df$var at indexb
#the rest 50% is "C"