来自两个表的data.table抽样

时间:2015-07-17 12:46:18

标签: r data.table

有没有更好的方法来做到这一点?我正在使用R data.table进行一些抽样。

它试图从表(samp.from.data)中使用基于计数的特定数字的权重进行抽样,以便可以将其添加回原始数据......

count.data <- data.table(CP=LETTERS[1:10],
                         count=sample(10:60,10,replace=TRUE))

orig.data <- data.table(CP=rep(LETTERS[1:10],times=count.data$count),
                        vc=sample(letters[1:6],size=sum(count.data$count),replace=TRUE))

# check that count.data is a good representation of orig.data
orig.data %>% group_by(CP) %>% summarise(count=n())


samp.from.data <- data.table(CP=rep(LETTERS[1:10],each=20),
                             UID=seq(200),
                             weight=runif(200,1,2))

setkey(count.data,'CP')
setkey(samp.from.data,'CP')
setkey(orig.data,'CP')

ll <- count.data[samp.from.data,]

ll1 <- ll[,.SD[sample(.N,head(count,1),replace=TRUE,prob=weight)],by=CP]
setkey(ll1,'CP')

# Add in the sampled values to the original data
# Is there a better way to do the sampling add adding back into original data more directly?
orig.data$UID <- ll1[,UID]

0 个答案:

没有答案