R - 配对数据中的样本

时间:2015-11-20 17:11:22

标签: r dplyr sample

我试图在配对数据中随机抽样变量。 idmen是我的配对标识符idind是我的perso 标识符jour是需要随机子集的变量。一个jouridmen需要相同。例如,idmen == 2,我们需要对etheir dimanchevendredi进行子集化。

这是数据

    idmen idind  jour actpr1
      1     1 lundi       111
      1     2 lundi       111
      2     1 dimanche    111
      2     2 dimanche    111
      2     1 vendredi    111
      2     2 vendredi    111
      3     1 dimanche    113
      3     2 dimanche    121
      3     1 lundi       111
      3     2 lundi       111

这是所需的输出 (当然,输出可能会有所不同,因为它必须随机选择)

我需要为每个idmen抽样一天。

     idmen idind  jour actpr1
      1     1 lundi       111
      1     2 lundi       111
      2     1 dimanche    111
      2     2 dimanche    111
      3     1 dimanche    113
      3     2 dimanche    121

我想到了像

这样的东西
library(dplyr) 
dta %>% group_by(idmen, jour) %>% sample_n(2)

但我不明白为什么这不起作用。

有任何线索吗?

structure(list(idmen = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 3), idind = c(1, 
 2, 1, 2, 1, 2, 1, 2, 1, 2), jour = structure(c(3L, 3L, 1L, 1L, 
 7L, 7L, 1L, 1L, 3L, 3L), .Label = c("dimanche", "jeudi   ", "lundi   ", 
 "mardi   ", "mercredi", "samedi  ", "vendredi"), class = "factor"), 
actpr1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 1L, 
1L), .Label = c("111", "112", "113", "121", "122", "123", 
"131", "132", "141", "143", "144", "145", "146", "151", "211", 
"212", "213", "223", "231", "233", "241", "261", "262", "271", 
"272", "311", "312", "313", "324", "331", "332", "334", "335", 
"341", "342", "343", "351", "372", "373", "374", "381", "382", 
"384", "385", "399", "411", "412", "413", "414", "419", "422", 
"423", "429", "431", "433", "510", "511", "512", "513", "514", 
"521", "522", "523", "524", "531", "532", "533", "541", "542", 
"613", "614", "616", "621", "622", "623", "627", "631", "632", 
"633", "634", "635", "636", "637", "638", "641", "651", "653", 
"655", "658", "661", "662", "663", "665", "667", "668", "669", 
"671", "672", "673", "674", "678", "810", "811", "812", "813", 
"819", "911", "999"), class = "factor")), .Names = c("idmen", 
 "idind", "jour", "actpr1"), row.names = c(NA, -10L), class = "data.frame")

2 个答案:

答案 0 :(得分:3)

也许试试这个:

> dta %>% group_by(idmen) %>% filter(jour == jour[sample(length(jour),1)])
Source: local data frame [6 x 4]
Groups: idmen [3]

  idmen idind     jour actpr1
  (dbl) (dbl)   (fctr) (fctr)
1     1     1 lundi       111
2     1     2 lundi       111
3     2     1 vendredi    111
4     2     2 vendredi    111
5     3     1 lundi       111
6     3     2 lundi       111

...虽然有一个"样本完整的小组"也许是内置在 dplyr 中的函数。

答案 1 :(得分:1)

这是Base R解决方案:

dta[unlist(sample(as.data.frame(matrix(1:nrow(dta),nrow = 2)),10,replace=T)),]

这利用了数据帧是列表的事实。在列表中使用sample()时,将占用数据帧的整列。然后在结果上使用unlist(),您已经将两行一起采样。这样可以替换10对,但当然可以改变。