我有一个数据表,其中包含一列bin和一列值。这些箱在数据框内重复。我想从每个bin中选择一个预定数量的值。可以通过在参考数据帧中查找包含一列中的二进制数和第二列中的相应num.to.sample值的二进制数来找到该预定数。 num.to.sample值应该用于从采样函数中的该bin中选择值。
#Example data
data = as.data.frame(cbind(rep(1:3, each=6)))
colnames(data) = "bin"
data$value = rnorm(18)
#Reference file used to determine how many data$values to select based on data$bin
ref = as.data.frame(cbind(1:3))
colnames(ref) = "bin"
ref$num.to.sample = c(1,2,3)
#Sample function
#num should be determined by the num.to.sample value that the bin matches to in ref
samples = function(x, num){
sample(x, num, replace=FALSE);
}
#this code below works for selecting a specific number of values by bin
#how can this be turned into the num.to.sample value that would result from matching
#data$bin to ref$bin and returning ref$num.to.sample?
data.sample = data[unlist(tapply(1:nrow(data),data$bin, function(x) samples(x,2))),]
data.sample
有什么想法吗?
谢谢!
答案 0 :(得分:1)
可能有更好的方法,但作为第一关,你可以使用
data <- merge(data, ref)
library(plyr)
ddply(data, "bin", function(x) x[sample(1:nrow(x), unique(x$num.to.sample)), ])