有没有更好的方法来做到这一点?我正在使用R data.table
进行一些抽样。
它试图从表(samp.from.data
)中使用基于计数的特定数字的权重进行抽样,以便可以将其添加回原始数据......
count.data <- data.table(CP=LETTERS[1:10],
count=sample(10:60,10,replace=TRUE))
orig.data <- data.table(CP=rep(LETTERS[1:10],times=count.data$count),
vc=sample(letters[1:6],size=sum(count.data$count),replace=TRUE))
# check that count.data is a good representation of orig.data
orig.data %>% group_by(CP) %>% summarise(count=n())
samp.from.data <- data.table(CP=rep(LETTERS[1:10],each=20),
UID=seq(200),
weight=runif(200,1,2))
setkey(count.data,'CP')
setkey(samp.from.data,'CP')
setkey(orig.data,'CP')
ll <- count.data[samp.from.data,]
ll1 <- ll[,.SD[sample(.N,head(count,1),replace=TRUE,prob=weight)],by=CP]
setkey(ll1,'CP')
# Add in the sampled values to the original data
# Is there a better way to do the sampling add adding back into original data more directly?
orig.data$UID <- ll1[,UID]