如何在R中制作随机分层样本?

时间:2016-11-10 13:00:34

标签: r random

我有一个data.frame调用" per"谁有三个变量:nrodocumento,cod_jer(42组)和grupo_fict(8组)。我想为每个cod_jer和每个grupo_fict内部提供一个随机样本(data.frame)。

> dput(head(per))
structure(list(nrodocumento = c(49574917L, 54692750L, 54731807L, 
57364176L, 57364198L, 46867674L), cod_jer = c(1146L, 32L, 0L, 
0L, 0L, 0L), grupo_fict = c(3L, 1L, 8L, 1L, 1L, 1L)), .Names =     
c("nrodocumento", 
"cod_jer", "grupo_fict"), row.names = c(NA, 6L), class = "data.frame")

> head(per,n=100)

    nrodocumento cod_jer grupo_fict
1       49574917    1146          3
2       54692750      32          1
3       54731807       0          8
4       57364176       0          1
5       57364198       0          1
6       46867674       0          1
7       46867668       0          1
8       57364201       0          1
9       53767871       0          1
10      55339012       0          1
11      49204318       0          8
12      53743017       0          1
13      47622958       0          1
14      49019862       0          1
15      50167428       0          2
16      48783260       0          4
17      52020945     433          5
18      54486680     236          4
19      51402916       0          4
20      48543242       0          2
21      54671603       0          1
22      50644599       0          8
23      53293608       0          1
24      52742799       0          4
25      49815210       0          8
26      50967719     236          3
27      51938997       0          8
28      50057188     324          3
29      52754706       0          6
30      55322102       0          3
31      53040748       0          1
32      50321642       0          5
33      51621354     236          8
34      49611806       0          7
35      53347667       0          8
36      52462498       0          3
37      54158570       0          8
38      54034849       0          8
39      52507674     321          3
40      50218598     317          7
41      45078442     432          7
42      51491066       0          8
43      53278953       0          2
44      52661658       0          2
45      50092873     236          3
46      50308064       0          7
47      51941635       0          7
48      53527966       0          1
49      49614579       0          1
50      49450678     318          8
51      52953427    1146          7
52      52133221       0          8
53      53363128       0          7
54      52819643       0          1
55      47516589       0          1
56      52563137       0          3
57      49511296       0          7
58      54154013       0          2
59      50822420    1349          4
60      50822408    1349          4
61      50822414    1349          6
62      52339683       0          1
63      50026113       0          7
64      47328586       0          7
65      56041961       0          7
66      47756955     432          8
67      53158397       0          7
68      53151167       0          7
69      54710039       0          3
70      54408844     114          4
71      46286323     114          4
72      50310877       0          1
73      50929135       0          7
74      49817218       0          1
75      53604540       0          8
76      52812736    1147          1
77      53726314    1147          1
78      50835936       0          8
79      55429334       0          1
80      48421020     329          8
81      49800217       0          3
82      52818263       0          1
83      45884978       0          1
84      50203385       0          1
85      53433610       0          2
86      54515938       0          1
87      50263935       0          8
88      52439152       0          2
89      48424129     236          3
90      47031563       0          8
91      53577610      11          1
92      48759083      11          1
93      50344731     432          1
94      51164013       0          3
95      52026977     163          7
96      50965482       0          3
97      45947594     433          8
98      53357234       0          7
99      48367529       0          8
100     54286153       0          3


> table(per$cod_jer,per$grupo_fict)

          1    2    3    4    5    6    7    8
  0    3990 2296 1743 1453  356  250 2031 2051
  11    149   85   29   34   14    6   34   25
  13      2    4    1    0    0    0    1    1
  14      3    1    0    0    0    0    0    1
  32     37   12   13   10    3    1   23   13
  101    19   12    6    5    3    0    6   12
  102     2    0    0    0    0    0    0    0
  103    11   10    3    3    0    1    3    0
  104    17    8    1    7    2    1    7    9
  105    11   12    3    3    3    0    6   10
  106   147   57   30   29    8    1   43   42
  107    33   37    5    9    3    2    8    9
  108     6   10    2    3    0    2    3    4
  109    44   37   11    9    6    2   14   14
  111   112   81   26   28    8    3   22   18
  112    21    8    4    8    2    0    3    2
  113    94   61   14   16    4    1   17   24
  114    60   52   10   14    9    5    8   20
  115    72   24   21   13    5    1   11   16
  125     5    4    1    0    1    0    0    1
  138    15    5    2    2    1    0    2    0
  163    50   35   26   26    7   12   43   41
  234    51   43   31   32   10    7   49   53
  236    78   29   46   35    7    7   39   37
  317    44   28   21   13    7    2   28   21
  318    20   27    5   10    4    3   12   14
  319    45   21   25   19    1    2   26   21
  321     6    4    9    3    0    3    8    1
  322    43   30   24   16    5    3   16   34
  323    30   14   25   15    3    4   24   22
  324    59   29   31   27    8    5   28   27
  325    15   12    6    5    1    2    8   11
  326    18   12   17   13    4    2   20   15
  327    45   28   23   26    7    6   25   40
  328    52   49   33   32    5    9   31   35
  329    42   36   26   20    2    3   23   30
  431     6    2    4    1    2    0    2    6
  432    39   18   27   24    5    1   28   34
  433   139   92   90   89   18   13   61   66
  1146   97   49   26   14    7    5   24   29
  1147   56   33   26   25    9    0   19   20
  1349   15    9   11   10    0    1   10    3
  1544   62   33   20   32    4    3   25   43
  1545   37   13   22   14    1    3   14   31
  1848   16   27   11   15    3    0   10   12

另一方面,我有一个数据空间,我的意思是,每个gruop里面需要的每个样本的大小。

> dput(head(vacantes))
structure(list(cod_jer = c(101L, 316L, 325L, 1349L, 1544L, 102L
), vacantes = c(132, 180, 54, 63, 45, 0), vac1 = c(27, 36, 11, 
13, 9, 0), vac2 = c(27, 36, 11, 13, 9, 0), vac3 = c(24, 33, 10, 
12, 9, 0), vac4 = c(24, 33, 10, 12, 9, 0), vac5 = c(8, 11, 4, 
4, 3, 0), vac6 = c(8, 11, 4, 4, 3, 0), vac7 = c(7, 10, 3, 3, 
2, 0), vac8 = c(7, 10, 3, 3, 2, 0)), .Names = c("cod_jer", "vacantes", 
"vac1", "vac2", "vac3", "vac4", "vac5", "vac6", "vac7", "vac8"
), row.names = c(NA, 6L), class = "data.frame")

 > vacantes
    cod_jer vacantes vac1 vac2 vac3 vac4 vac5 vac6 vac7 vac8 
 1      101      132   27   27   24   24    8    8    7    7            
 2      316      180   36   36   33   33   11   11   10   10            
 3      325       54   11   11   10   10    4    4    3    3             
 4     1349       63   13   13   12   12    4    4    3    3             
 5     1544       45    9    9    9    9    3    3    2    2             
 6      102        0    0    0    0    0    0    0    0    0              
 7      103        0    0    0    0    0    0    0    0    0             
 8      104        0    0    0    0    0    0    0    0    0              
 9      105        0    0    0    0    0    0    0    0    0              
 10     106        0    0    0    0    0    0    0    0    0              
 11     107        0    0    0    0    0    0    0    0    0              
 12     108        0    0    0    0    0    0    0    0    0              
 13     109        0    0    0    0    0    0    0    0    0              
 14     110        0    0    0    0    0    0    0    0    0              
 15     111        0    0    0    0    0    0    0    0    0              
 16     112        0    0    0    0    0    0    0    0    0              
 17     113        0    0    0    0    0    0    0    0    0              
 18     114        0    0    0    0    0    0    0    0    0              
 19     115        0    0    0    0    0    0    0    0    0              
 20     137        0    0    0    0    0    0    0    0    0              
 21     138        0    0    0    0    0    0    0    0    0              
 22     139        0    0    0    0    0    0    0    0    0              
 23     140        0    0    0    0    0    0    0    0    0              
 24     234        0    0    0    0    0    0    0    0    0              
 25     236        0    0    0    0    0    0    0    0    0              
 26     317        0    0    0    0    0    0    0    0    0              
 27     318        0    0    0    0    0    0    0    0    0             
 28     319        0    0    0    0    0    0    0    0    0              
 29     320        0    0    0    0    0    0    0    0    0              
 30     321        0    0    0    0    0    0    0    0    0              
 31     322        0    0    0    0    0    0    0    0    0              
 32     323        0    0    0    0    0    0    0    0    0              
 33     324        0    0    0    0    0    0    0    0    0              
 34     326        0    0    0    0    0    0    0    0    0              
 35     327        0    0    0    0    0    0    0    0    0              
 36     328        0    0    0    0    0    0    0    0    0              
 37     329        0    0    0    0    0    0    0    0    0              
 38     431        0    0    0    0    0    0    0    0    0              
 39     432        0    0    0    0    0    0    0    0    0              
 40     433        0    0    0    0    0    0    0    0    0              
 41    1146        0    0    0    0    0    0    0    0    0              
 42    1147        0    0    0    0    0    0    0    0    0              
 43    1545        0    0    0    0    0    0    0    0    0              
 44    1630        0    0    0    0    0    0    0    0    0              
 45    1848        0    0    0    0    0    0    0    0    0                 

我想在每个组合组中制作一个样本层:cod_jer和grupo_fict,如果空位为0,样本大小将为0.

我正在尝试这个:

size=subset(vacantes,select=c(vac1,vac2,vac3,vac4,vac5,vac6,vac7,vac8))

size=as.matrix(size)
size=as.vector(size)


  for(i in 1:length(size)) {
  if (size[i] > 0 ) {
       s=strata(per,c("cod_jer","grupo_fict"),size=size,      
  method="srswor")
     } else { 
       s="0"
     }}

但我无法让它发挥作用:(

任何消化?

谢谢!

0 个答案:

没有答案