我的市场反应数据格式为:
head(df)
ID market q1 q2
470 France 1 3
625 Germany 0 2
155 Italy 1 6
648 Spain 0 5
862 France 1 7
699 Germany 0 8
460 Italy 1 6
333 Spain 1 5
776 Spain 1 4
以及以下频率:
table(df$market)
France 140
Germany 300
Italy 50
Spain 75
我需要创建一个数据框,其中包含每个市场100个响应的样本,以及所有响应在没有替换的情况下,如果这些响应少于100个。
所以
table(df_new$market)
France 100
Germany 100
Italy 50
Spain 75
提前致谢!
答案 0 :(得分:0)
以下内容看起来有效:
set.seed(10); DF = data.frame(c1 = sample(LETTERS[1:4], 25, T), c2 = runif(25))
freqs = as.data.frame(table(DF$c1))
freqs$ss = ifelse(freqs$Freq >= 5, 5, freqs$Freq)
#> freqs
# Var1 Freq ss
#1 A 4 4
#2 B 11 5
#3 C 7 5
#4 D 3 3
res = mapply(function(x, y) DF[sample(which(DF$c1 %in% x), y), ],
x = freqs$Var1, y = freqs$ss, SIMPLIFY = F)
do.call(rbind, res)
# c1 c2
#5 A 0.3558977
#17 A 0.2289039
#6 A 0.5355970
#13 A 0.9546536
#3 B 0.2395891
#25 B 0.8015470
#10 B 0.4226376
#15 B 0.5005032
#19 B 0.7289646
#11 C 0.7477465
#9 C 0.8998325
#12 C 0.8226526
#1 C 0.7066469
#4 C 0.7707715
#23 D 0.4861003
#20 D 0.2498805
#21 D 0.1611833