我有一个data.frame调用" per"谁有三个变量:nrodocumento,cod_jer(42组)和grupo_fict(8组)。我想为每个cod_jer和每个grupo_fict内部提供一个随机样本(data.frame)。
> dput(head(per))
structure(list(nrodocumento = c(49574917L, 54692750L, 54731807L,
57364176L, 57364198L, 46867674L), cod_jer = c(1146L, 32L, 0L,
0L, 0L, 0L), grupo_fict = c(3L, 1L, 8L, 1L, 1L, 1L)), .Names =
c("nrodocumento",
"cod_jer", "grupo_fict"), row.names = c(NA, 6L), class = "data.frame")
> head(per,n=100)
nrodocumento cod_jer grupo_fict
1 49574917 1146 3
2 54692750 32 1
3 54731807 0 8
4 57364176 0 1
5 57364198 0 1
6 46867674 0 1
7 46867668 0 1
8 57364201 0 1
9 53767871 0 1
10 55339012 0 1
11 49204318 0 8
12 53743017 0 1
13 47622958 0 1
14 49019862 0 1
15 50167428 0 2
16 48783260 0 4
17 52020945 433 5
18 54486680 236 4
19 51402916 0 4
20 48543242 0 2
21 54671603 0 1
22 50644599 0 8
23 53293608 0 1
24 52742799 0 4
25 49815210 0 8
26 50967719 236 3
27 51938997 0 8
28 50057188 324 3
29 52754706 0 6
30 55322102 0 3
31 53040748 0 1
32 50321642 0 5
33 51621354 236 8
34 49611806 0 7
35 53347667 0 8
36 52462498 0 3
37 54158570 0 8
38 54034849 0 8
39 52507674 321 3
40 50218598 317 7
41 45078442 432 7
42 51491066 0 8
43 53278953 0 2
44 52661658 0 2
45 50092873 236 3
46 50308064 0 7
47 51941635 0 7
48 53527966 0 1
49 49614579 0 1
50 49450678 318 8
51 52953427 1146 7
52 52133221 0 8
53 53363128 0 7
54 52819643 0 1
55 47516589 0 1
56 52563137 0 3
57 49511296 0 7
58 54154013 0 2
59 50822420 1349 4
60 50822408 1349 4
61 50822414 1349 6
62 52339683 0 1
63 50026113 0 7
64 47328586 0 7
65 56041961 0 7
66 47756955 432 8
67 53158397 0 7
68 53151167 0 7
69 54710039 0 3
70 54408844 114 4
71 46286323 114 4
72 50310877 0 1
73 50929135 0 7
74 49817218 0 1
75 53604540 0 8
76 52812736 1147 1
77 53726314 1147 1
78 50835936 0 8
79 55429334 0 1
80 48421020 329 8
81 49800217 0 3
82 52818263 0 1
83 45884978 0 1
84 50203385 0 1
85 53433610 0 2
86 54515938 0 1
87 50263935 0 8
88 52439152 0 2
89 48424129 236 3
90 47031563 0 8
91 53577610 11 1
92 48759083 11 1
93 50344731 432 1
94 51164013 0 3
95 52026977 163 7
96 50965482 0 3
97 45947594 433 8
98 53357234 0 7
99 48367529 0 8
100 54286153 0 3
> table(per$cod_jer,per$grupo_fict)
1 2 3 4 5 6 7 8
0 3990 2296 1743 1453 356 250 2031 2051
11 149 85 29 34 14 6 34 25
13 2 4 1 0 0 0 1 1
14 3 1 0 0 0 0 0 1
32 37 12 13 10 3 1 23 13
101 19 12 6 5 3 0 6 12
102 2 0 0 0 0 0 0 0
103 11 10 3 3 0 1 3 0
104 17 8 1 7 2 1 7 9
105 11 12 3 3 3 0 6 10
106 147 57 30 29 8 1 43 42
107 33 37 5 9 3 2 8 9
108 6 10 2 3 0 2 3 4
109 44 37 11 9 6 2 14 14
111 112 81 26 28 8 3 22 18
112 21 8 4 8 2 0 3 2
113 94 61 14 16 4 1 17 24
114 60 52 10 14 9 5 8 20
115 72 24 21 13 5 1 11 16
125 5 4 1 0 1 0 0 1
138 15 5 2 2 1 0 2 0
163 50 35 26 26 7 12 43 41
234 51 43 31 32 10 7 49 53
236 78 29 46 35 7 7 39 37
317 44 28 21 13 7 2 28 21
318 20 27 5 10 4 3 12 14
319 45 21 25 19 1 2 26 21
321 6 4 9 3 0 3 8 1
322 43 30 24 16 5 3 16 34
323 30 14 25 15 3 4 24 22
324 59 29 31 27 8 5 28 27
325 15 12 6 5 1 2 8 11
326 18 12 17 13 4 2 20 15
327 45 28 23 26 7 6 25 40
328 52 49 33 32 5 9 31 35
329 42 36 26 20 2 3 23 30
431 6 2 4 1 2 0 2 6
432 39 18 27 24 5 1 28 34
433 139 92 90 89 18 13 61 66
1146 97 49 26 14 7 5 24 29
1147 56 33 26 25 9 0 19 20
1349 15 9 11 10 0 1 10 3
1544 62 33 20 32 4 3 25 43
1545 37 13 22 14 1 3 14 31
1848 16 27 11 15 3 0 10 12
另一方面,我有一个数据空间,我的意思是,每个gruop里面需要的每个样本的大小。
> dput(head(vacantes))
structure(list(cod_jer = c(101L, 316L, 325L, 1349L, 1544L, 102L
), vacantes = c(132, 180, 54, 63, 45, 0), vac1 = c(27, 36, 11,
13, 9, 0), vac2 = c(27, 36, 11, 13, 9, 0), vac3 = c(24, 33, 10,
12, 9, 0), vac4 = c(24, 33, 10, 12, 9, 0), vac5 = c(8, 11, 4,
4, 3, 0), vac6 = c(8, 11, 4, 4, 3, 0), vac7 = c(7, 10, 3, 3,
2, 0), vac8 = c(7, 10, 3, 3, 2, 0)), .Names = c("cod_jer", "vacantes",
"vac1", "vac2", "vac3", "vac4", "vac5", "vac6", "vac7", "vac8"
), row.names = c(NA, 6L), class = "data.frame")
> vacantes
cod_jer vacantes vac1 vac2 vac3 vac4 vac5 vac6 vac7 vac8
1 101 132 27 27 24 24 8 8 7 7
2 316 180 36 36 33 33 11 11 10 10
3 325 54 11 11 10 10 4 4 3 3
4 1349 63 13 13 12 12 4 4 3 3
5 1544 45 9 9 9 9 3 3 2 2
6 102 0 0 0 0 0 0 0 0 0
7 103 0 0 0 0 0 0 0 0 0
8 104 0 0 0 0 0 0 0 0 0
9 105 0 0 0 0 0 0 0 0 0
10 106 0 0 0 0 0 0 0 0 0
11 107 0 0 0 0 0 0 0 0 0
12 108 0 0 0 0 0 0 0 0 0
13 109 0 0 0 0 0 0 0 0 0
14 110 0 0 0 0 0 0 0 0 0
15 111 0 0 0 0 0 0 0 0 0
16 112 0 0 0 0 0 0 0 0 0
17 113 0 0 0 0 0 0 0 0 0
18 114 0 0 0 0 0 0 0 0 0
19 115 0 0 0 0 0 0 0 0 0
20 137 0 0 0 0 0 0 0 0 0
21 138 0 0 0 0 0 0 0 0 0
22 139 0 0 0 0 0 0 0 0 0
23 140 0 0 0 0 0 0 0 0 0
24 234 0 0 0 0 0 0 0 0 0
25 236 0 0 0 0 0 0 0 0 0
26 317 0 0 0 0 0 0 0 0 0
27 318 0 0 0 0 0 0 0 0 0
28 319 0 0 0 0 0 0 0 0 0
29 320 0 0 0 0 0 0 0 0 0
30 321 0 0 0 0 0 0 0 0 0
31 322 0 0 0 0 0 0 0 0 0
32 323 0 0 0 0 0 0 0 0 0
33 324 0 0 0 0 0 0 0 0 0
34 326 0 0 0 0 0 0 0 0 0
35 327 0 0 0 0 0 0 0 0 0
36 328 0 0 0 0 0 0 0 0 0
37 329 0 0 0 0 0 0 0 0 0
38 431 0 0 0 0 0 0 0 0 0
39 432 0 0 0 0 0 0 0 0 0
40 433 0 0 0 0 0 0 0 0 0
41 1146 0 0 0 0 0 0 0 0 0
42 1147 0 0 0 0 0 0 0 0 0
43 1545 0 0 0 0 0 0 0 0 0
44 1630 0 0 0 0 0 0 0 0 0
45 1848 0 0 0 0 0 0 0 0 0
我想在每个组合组中制作一个样本层:cod_jer和grupo_fict,如果空位为0,样本大小将为0.
我正在尝试这个:
size=subset(vacantes,select=c(vac1,vac2,vac3,vac4,vac5,vac6,vac7,vac8))
size=as.matrix(size)
size=as.vector(size)
for(i in 1:length(size)) {
if (size[i] > 0 ) {
s=strata(per,c("cod_jer","grupo_fict"),size=size,
method="srswor")
} else {
s="0"
}}
但我无法让它发挥作用:(
任何消化?
谢谢!