从其他数据框填写数据框

时间:2020-03-25 22:01:49

标签: r data.table

data1=data.frame("Group1" = sample(1:2,100,r=T),
                  "Group2" = sample(c('a','b'),100,r=T),
                  "V1" = sample(1:3, 100, r=T),
                  "V2" = sample(0:1, 100, r=T),
                  "V3" = sample(1:5, 100, r=T),
                  "V4" = sample(1:2, 100, r=T))


data2=data.frame("Group1"=c(1,1,2,2),
                  "Group2"=c('a','b','a','b'),
                  "Size"=c(900,768,651,102))

我希望从'V1'的{​​{1}}到'V4'列中随机抽样,并用它来填充data1

我想通过data2'Group1'进行采样,并对每个组组合进行n次复制,其中在'Group2'中将n定义为'Size'

所需的输出应具有900 + 768 + 651 + 102 = 2421行。我希望更换样品。

1 个答案:

答案 0 :(得分:1)

基于上一个问题/答案,我们可以将以'V'开头的列名称('nm1')与第一个数据集on'Group1','Group2'进行联接,得到samplereplace = TRUE,并使用该索引填充采样列的值

library(data.table)
nm1 <- grep("^V\\d+", names(data1), value = TRUE)
setDT(data2)[data1, on = .(Group1, Group2)][,
   .(i_samp = sample(.I, Size, replace = TRUE)), by = .(Group1, Group2, Size)
         ][, (nm1) :=   data1[i_samp, nm1], .(Group1, Group2)][]