我正在使用从R中MASS包中的Cars93提取的简单数据集。我正在这个简单的数据集上运行randomForest来预测来自其他四个预测变量的起源(美国或非美国)。如果我通过它在数据集中的显式位置引用原始变量,如下所示(它位于tempdata的第五列),我得到的答案与我按名称引用的答案完全不同。这是一个因素,我使用相同的随机种子开始。任何人都可以阐明这一点。如果我看一下tempdata $ origin == tempdata [,5]我得到一个大小为93的向量(93个数据点全部为TRUE),所以它与结果完全相同。提前感谢您的帮助。
set.seed(2000)
> testF<-randomForest(tempdata3[,5]~., data=tempdata3[,1:5], ntree=100,
na.action=na.omit, seed=2014)
> testF$votes[1:10,]
USA non-USA
2 0.1315789 0.86842105
6 0.9487179 0.05128205
8 1.0000000 0.00000000
8.1 1.0000000 0.00000000
10 1.0000000 0.00000000
12 0.9729730 0.02702703
12.1 0.9756098 0.02439024
13 0.9512195 0.04878049
13.1 0.9705882 0.02941176
13.2 0.9259259 0.07407407
set.seed(2000)
> testF<-randomForest(Origin~., data=tempdata3[,1:5], ntree=100,
na.action=na.omit, seed=2014)
> testF$votes[1:10,]
USA non-USA
2 0.7027027 0.2972973
6 0.6571429 0.3428571
8 1.0000000 0.0000000
8.1 1.0000000 0.0000000
10 1.0000000 0.0000000
12 0.8947368 0.1052632
12.1 0.8918919 0.1081081
13 0.3636364 0.6363636
13.1 0.5128205 0.4871795
13.2 0.4571429 0.5428571
tempdata3 is here:
tempdata3
Type MPG.highway Passengers Length Origin id
2 Midsize 25 5 195 non-USA 2
6 Midsize 31 6 189 USA 6
8 Large 25 6 216 USA 8
8.1 Large 25 6 216 USA 8
10 Large 25 6 206 USA 10
12 Compact 36 5 182 USA 12
12.1 Compact 36 5 182 USA 12
13 Compact 34 5 184 USA 13
13.1 Compact 34 5 184 USA 13
13.2 Compact 34 5 184 USA 13
16 Van 23 7 178 USA 16
16.1 Van 23 7 178 USA 16
16.2 Van 23 7 178 USA 16
17 Van 20 8 194 USA 17
18 Large 26 6 214 USA 18
20 Large 28 6 203 USA 20
21 Compact 28 6 183 USA 21
21.1 Compact 28 6 183 USA 21
22 Large 26 6 203 USA 22
22.1 Large 26 6 203 USA 22
22.2 Large 26 6 203 USA 22
22.3 Large 26 6 203 USA 22
23 Small 33 5 174 USA 23
23.1 Small 33 5 174 USA 23
28 Sporty 24 4 180 USA 28
29 Small 33 5 174 USA 29
30 Large 28 6 202 USA 30
30.1 Large 28 6 202 USA 30
31 Small 33 4 141 USA 31
33 Compact 27 5 177 USA 33
33.1 Compact 27 5 177 USA 33
34 Sporty 29 4 180 USA 34
35 Sporty 30 4 179 USA 35
36 Van 20 7 176 USA 36
37 Midsize 30 5 192 USA 37
38 Large 26 6 212 USA 38
39 Small 50 4 151 non-USA 39
40 Sporty 36 4 164 non-USA 40
41 Sporty 31 4 175 non-USA 41
42 Small 46 4 173 non-USA 42
42.1 Small 46 4 173 non-USA 42
43 Compact 31 4 185 non-USA 43
43.1 Compact 31 4 185 non-USA 43
44 Small 33 5 168 non-USA 44
44.1 Small 33 5 168 non-USA 44
44.2 Small 33 5 168 non-USA 44
47 Midsize 27 5 184 non-USA 47
47.1 Midsize 27 5 184 non-USA 47
47.2 Midsize 27 5 184 non-USA 47
48 Midsize 22 5 200 non-USA 48
49 Midsize 24 5 188 non-USA 49
49.1 Midsize 24 5 188 non-USA 49
50 Midsize 23 4 191 non-USA 50
50.1 Midsize 23 4 191 non-USA 50
51 Midsize 26 6 205 USA 51
51.1 Midsize 26 6 205 USA 51
53 Small 37 4 164 non-USA 53
55 Compact 34 5 184 non-USA 55
55.1 Compact 34 5 184 non-USA 55
57 Sporty 25 2 169 non-USA 57
57.1 Sporty 25 2 169 non-USA 57
58 Compact 29 5 175 non-USA 58
60 Sporty 26 4 166 USA 60
63 Midsize 24 5 190 non-USA 63
66 Van 23 7 190 non-USA 66
66.1 Van 23 7 190 non-USA 66
66.2 Van 23 7 190 non-USA 66
66.3 Van 23 7 190 non-USA 66
68 Compact 31 5 188 USA 68
69 Midsize 31 5 190 USA 69
70 Van 23 7 194 USA 70
71 Large 28 6 201 USA 71
72 Sporty 30 4 173 USA 72
74 Compact 31 5 181 USA 74
74.1 Compact 31 5 181 USA 74
76 Midsize 27 5 195 USA 76
77 Large 28 6 177 USA 77
79 Small 38 5 176 USA 79
80 Small 37 4 146 non-USA 80
81 Small 30 5 175 non-USA 81
82 Compact 30 5 179 non-USA 82
84 Small 37 5 162 non-USA 84
85 Sporty 32 4 174 non-USA 85
86 Midsize 29 5 188 non-USA 86
86.1 Midsize 29 5 188 non-USA 86
88 Small 33 4 163 non-USA 88
89 Van 21 7 187 non-USA 89
91 Sporty 25 4 159 non-USA 91
91.1 Sporty 25 4 159 non-USA 91
92 Compact 28 5 190 non-USA 92
93 Midsize 28 5 184 non-USA 93
93.1 Midsize 28 5 184 non-USA 93