根据唯一值和列值随机从数据框中绘制行

时间:2017-09-28 16:22:56

标签: r random data.table subset

我有一个包含许多描述符变量的数据帧(trt,individual,session)。我希望能够随机选择可能的trt x individual组合的一小部分但控制会话变量,以便没有随机拉动具有相同的会话编号。这是我的数据框架的样子:

trt <- c(rep(c(rep("A", 3), rep("B", 3), rep("C", 3)), 9))
individual <- rep(c("Bob", "Nancy", "Tim"), 27)
session <- rep(1:27, each = 3)
data <- rnorm(81, mean = 4, sd = 1)
df <- data.frame(trt, individual, session, data))
df
   trt individual session             data
1    A        Bob       1 3.72013685581385
2    A      Nancy       1 3.97225419000673
3    A        Tim       1 4.44714175686225
4    B        Bob       2 5.00024599458127
5    B      Nancy       2 3.43615965145765
6    B        Tim       2  6.7920094635501
7    C        Bob       3 4.36315054477571
8    C      Nancy       3 5.07117348146375
9    C        Tim       3 4.38503325758969
10   A        Bob       4 4.30677162933005
11   A      Nancy       4 1.89311687510669
12   A        Tim       4 3.09084920968413
13   B        Bob       5 3.10436190897144
14   B      Nancy       5 3.59454992439722
15   B        Tim       5 3.40778069131207
16   C        Bob       6 4.00171937800892
17   C      Nancy       6 0.14578811080644
18   C        Tim       6 4.20754733296227
19   A        Bob       7 3.69131009783284
20   A      Nancy       7  4.7025756891679
21   A        Tim       7 4.46196017363017
22   B        Bob       8 3.97573281432736
23   B      Nancy       8  4.5373185942686
24   B        Tim       8 2.40937847038141
25   C        Bob       9 4.57519884980087
26   C      Nancy       9 5.19143914630448
27   C        Tim       9 4.83144732833874
28   A        Bob      10 3.01769965527235
29   A      Nancy      10 5.17300616827746
30   A        Tim      10 4.65432284571663
31   B        Bob      11 4.50892032922527
32   B      Nancy      11 3.38082717995663
33   B        Tim      11 4.92022245677209
34   C        Bob      12 4.54149796547394
35   C      Nancy      12 3.21992774137179
36   C        Tim      12 3.74507360931023
37   A        Bob      13 3.39524949548056
38   A      Nancy      13 4.17518916890901
39   A        Tim      13 3.02932375225388
40   B        Bob      14 3.59660910672907
41   B      Nancy      14 2.08784850191654
42   B        Tim      14 3.98446125755258
43   C        Bob      15 4.01837496797085
44   C      Nancy      15 3.40610126858125
45   C        Tim      15 4.57107635588582
46   A        Bob      16 3.15839276840723
47   A      Nancy      16 2.19932140340504
48   A        Tim      16 4.77588798035668
49   B        Bob      17  4.3524768657397
50   B      Nancy      17 4.49071625925856
51   B        Tim      17 4.02576463486266
52   C        Bob      18 3.74783360762117
53   C      Nancy      18 2.84123227236184
54   C        Tim      18  3.2024114782253
55   A        Bob      19 4.93837445490921
56   A      Nancy      19  4.7103051496802
57   A        Tim      19 6.22083635045134
58   B        Bob      20  4.5177747677824
59   B      Nancy      20 1.78839270771153
60   B        Tim      20 5.07140678136995
61   C        Bob      21 3.47818616035335
62   C      Nancy      21 4.28526474048439
63   C        Tim      21 4.22597602946575
64   A        Bob      22 1.91700925257901
65   A      Nancy      22 2.96317997587458
66   A        Tim      22 2.53506974227672
67   B        Bob      23 5.52714403395316
68   B      Nancy      23  3.3618513551059
69   B        Tim      23 4.85869007113978
70   C        Bob      24  3.4367068543959
71   C      Nancy      24 4.47769879000349
72   C        Tim      24 5.77340483757836
73   A        Bob      25 4.78524317734622
74   A      Nancy      25 3.55373702554664
75   A        Tim      25 2.88541465503637
76   B        Bob      26 4.62885302019139
77   B      Nancy      26 3.59430293369092
78   B        Tim      26 2.29610255924296
79   C        Bob      27 4.38433001299722
80   C      Nancy      27 3.77825207859976
81   C        Tim      27 2.12163194694365

如何使用唯一的会话号码提取每个trt x individual组合中的2个?这是我想要数据帧的示例:

       trt individual session             data
    1    A        Bob       1 3.72013685581385
    5    B      Nancy       2 3.43615965145765
    7    C        Bob       3 4.36315054477571
    12   A        Tim       4 3.09084920968413
    15   B        Tim       5 3.40778069131207
    17   C      Nancy       6 0.14578811080644
    19   A        Bob       7 3.69131009783284
    29   A      Nancy      10 5.17300616827746
    31   B        Bob      11 4.50892032922527
    34   C        Bob      12 4.54149796547394
    39   A        Tim      13 3.02932375225388
    40   B        Bob      14 3.59660910672907
    47   A      Nancy      16 2.19932140340504
    51   B        Tim      17 4.02576463486266
    54   C        Tim      18  3.2024114782253
    59   B      Nancy      20 1.78839270771153
    71   C      Nancy      24 4.47769879000349
    81   C        Tim      27 2.12163194694365

我尝试了几件没有运气的事情。

我试图随机选择两个trt x individual组合,但最终会出现重复的会话值:

setDT((df))
df[ , .SD[sample(.N, 2)] , keyby = .(trt, individual)]
    trt individual session             data
 1:   A        Bob      25  2.7560788894668
 2:   A        Bob      19 4.12040841647523
 3:   A      Nancy       4 5.35362338127901
 4:   A      Nancy      19 5.51636882737692
 5:   A        Tim      19 5.10553640201998
 6:   A        Tim       1 2.77380671625473
 7:   B        Bob      23 3.50585105164409
 8:   B        Bob       8 3.58167259470814
 9:   B      Nancy      23 2.85301307507985
10:   B      Nancy       8 2.85179395539781
11:   B        Tim      26 2.40666507132474
12:   B        Tim      20 3.31276311351286
13:   C        Bob      24 3.19076007024549
14:   C        Bob       3 3.59146613276121
15:   C      Nancy       9 4.46606667880457
16:   C      Nancy      15 2.25405252536256
17:   C        Tim      12 4.43111661206133
18:   C        Tim      27 4.23868848646589

我尝试随机选择每个会话编号中的一个,然后拉出2个trt x individual组合,但它通常会返回错误,因为随机选择不会获取相同数量的trt x individual组合:< / p>

ind <- sapply( unique(df$session ) , function(x) sample( which(df$session == x) , 1) )
df.unique <- df[ind, ]
df.sub <- df.unique[, .SD[sample(.N, 2)] , by = .(trt, individual)]
Error in `[.data.frame`(df.unique, , .SD[sample(.N, 2)], by = .(trt, individual)) : 
  unused argument (by = .(trt, individual))

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:4)

也许有一种聪明的方式可以采样,但这是一个直截了当的想法让你在同一时间开始:

setDT(df)
setkey(df, session)

usedsessions = 0 # some value that's not a session number
df[, {
       res = .SD[!.(usedsessions)][sample(.N, 2)]
       usedsessions = c(usedsessions, res$session)
       res
     }
   , by = .(trt, individual)]
#    trt individual session     data
# 1:   A        Bob       7 4.256668
# 2:   A        Bob      25 2.431821
# 3:   A      Nancy      16 4.785859
# 4:   A      Nancy      19 4.865248
# 5:   A        Tim       4 3.303689
# 6:   A        Tim      13 3.550261
# 7:   B        Bob      26 3.987136
# 8:   B        Bob      17 3.283055
# 9:   B      Nancy      14 3.177226
#10:   B      Nancy       2 3.639542
#11:   B        Tim       8 2.168447
#12:   B        Tim       5 3.521123
#13:   C        Bob      21 3.284245
#14:   C        Bob      12 5.773098
#15:   C      Nancy      24 4.624428
#16:   C      Nancy       9 3.235467
#17:   C        Tim      18 4.001395
#18:   C        Tim      27 5.002110

您可能需要添加角落案例处理(例如,如果没有此类抽样)。