如何应对"等级不合适可能会产生误导。在R?

时间:2016-07-14 20:08:30

标签: r

我试图根据列车数据集预测测试数据集的值,它预测值(无错误)但是预测偏差A LOT原始值。甚至可以预测-356附近的值,尽管原始值都不超过200(并且没有负值)。这个警告让我烦恼,因为我认为由于这个警告,价值观偏离了很多。

Warning message:
In predict.lm(fit2, data_test) :
  prediction from a rank-deficient fit may be misleading

我能以任何方式摆脱这种警告吗?代码很简单

fit2 <- lm(runs~., data=train_data)
prediction<-predict(fit2, data_test)
prediction

我搜索了很多,但是我不太了解这个错误。 测试和训练数据集以防万一有人需要它们

> str(train_data)
'data.frame':   36 obs. of  28 variables:
 $ matchid                  : int  57 58 55 56 53 54 51 52 45 46 ...
 $ TeamName                 : chr  "South Africa" "West Indies" "South Africa" "West Indies" ...
 $ Opp_TeamName             : chr  "West Indies" "South Africa" "West Indies" "South Africa" ...
 $ TeamRank                 : int  4 3 4 3 4 3 10 7 5 1 ...
 $ Opp_TeamRank             : int  3 4 3 4 3 4 7 10 1 5 ...
 $ Team_Top10RankingBatsman : int  0 1 0 1 0 1 0 0 2 2 ...
 $ Team_Top50RankingBatsman : int  4 6 4 6 4 6 3 5 4 3 ...
 $ Team_Top100RankingBatsman: int  6 8 6 8 6 8 7 7 7 6 ...
 $ Opp_Top10RankingBatsman  : int  1 0 1 0 1 0 0 0 2 2 ...
 $ Opp_Top50RankingBatsman  : int  6 4 6 4 6 4 5 3 3 4 ...
 $ Opp_Top100RankingBatsman : int  8 6 8 6 8 6 7 7 6 7 ...
 $ InningType               : chr  "1st innings" "2nd innings" "1st innings" "2nd innings" ...
 $ Runs_OverAll             : num  361 705 348 630 347 ...
 $ AVG_Overall              : num  27.2 20 23.3 19.1 24 ...
 $ SR_Overall               : num  128 121 120 118 118 ...
 $ Runs_Last10Matches       : num  118.5 71 102.1 71 78.6 ...
 $ AVG_Last10Matches        : num  23.7 20.4 20.9 20.4 23.2 ...
 $ SR_Last10Matches         : num  120 106 114 106 116 ...
 $ Runs_BatingFirst         : num  236 459 230 394 203 ...
 $ AVG_BatingFirst          : num  30.6 23.2 24 21.2 27.1 ...
 $ SR_BatingFirst           : num  127 136 123 125 118 ...
 $ Runs_BatingSecond        : num  124 262 119 232 144 ...
 $ AVG_BatingSecond         : num  25.5 18.3 22.8 17.8 22.8 ...
 $ SR_BatingSecond          : num  125 118 112 117 114 ...
 $ Runs_AgainstTeam2        : num  88.3 118.3 76.3 103.9 49.3 ...
 $ AVG_AgainstTeam2         : num  28.2 23 24.7 22.1 16.4 ...
 $ SR_AgainstTeam2          : num  139 127 131 128 111 ...
 $ runs                     : int  165 168 231 236 195 126 143 141 191 135 ...
> str(data_test)
'data.frame':   34 obs. of  28 variables:
 $ matchid                  : int  59 60 61 62 63 64 65 66 69 70 ...
 $ TeamName                 : chr  "India" "West Indies" "England" "New Zealand" ...
 $ Opp_TeamName             : chr  "West Indies" "India" "New Zealand" "England" ...
 $ TeamRank                 : int  2 3 5 1 4 8 6 2 10 1 ...
 $ Opp_TeamRank             : int  3 2 1 5 8 4 2 6 1 10 ...
 $ Team_Top10RankingBatsman : int  1 1 2 2 0 0 1 1 0 2 ...
 $ Team_Top50RankingBatsman : int  5 6 4 3 4 2 5 5 3 3 ...
 $ Team_Top100RankingBatsman: int  7 8 7 6 6 5 7 7 7 6 ...
 $ Opp_Top10RankingBatsman  : int  1 1 2 2 0 0 1 1 2 0 ...
 $ Opp_Top50RankingBatsman  : int  6 5 3 4 2 4 5 5 3 3 ...
 $ Opp_Top100RankingBatsman : int  8 7 6 7 5 6 7 7 6 7 ...
 $ InningType               : chr  "1st innings" "2nd innings" "2nd innings" "1st innings" ...
 $ Runs_OverAll             : num  582 618 470 602 509 ...
 $ AVG_Overall              : num  25 21.8 20.3 20.7 19.6 ...
 $ SR_Overall               : num  113 120 123 120 112 ...
 $ Runs_Last10Matches       : num  182 107 117 167 140 ...
 $ AVG_Last10Matches        : num  37.1 43.8 21 24.9 27.3 ...
 $ SR_Last10Matches         : num  111 153 122 141 120 ...
 $ Runs_BatingFirst         : num  319 314 271 345 294 ...
 $ AVG_BatingFirst          : num  23.6 17.8 20.6 20.3 19.5 ...
 $ SR_BatingFirst           : num  116.9 98.5 118 124.3 115.8 ...
 $ Runs_BatingSecond        : num  264 282 304 256 186 ...
 $ AVG_BatingSecond         : num  28 23.7 31.9 21.6 16.5 ...
 $ SR_BatingSecond          : num  96.5 133.9 129.4 112 99.5 ...
 $ Runs_AgainstTeam2        : num  98.2 95.2 106.9 75.4 88.5 ...
 $ AVG_AgainstTeam2         : num  45.3 42.7 38.1 17.7 27.1 ...
 $ SR_AgainstTeam2          : num  125 138 152 110 122 ...
 $ runs                     : int  192 196 159 153 122 120 160 161 70 145 ...

简单来说,我怎样才能摆脱这个警告,以免影响我的预测?

(Intercept)                   matchid        TeamNameBangladesh 
            1699.98232628               -0.06793787               59.29445330 
          TeamNameEngland             TeamNameIndia       TeamNameNew Zealand 
             347.33030177             -499.40074338             -179.19192936 
         TeamNamePakistan      TeamNameSouth Africa         TeamNameSri Lanka 
            -272.71610614               -3.54867488              -45.27920191 
      TeamNameWest Indies    Opp_TeamNameBangladesh       Opp_TeamNameEngland 
            -345.54349798              135.05901017              108.04227770 
        Opp_TeamNameIndia   Opp_TeamNameNew Zealand      Opp_TeamNamePakistan 
            -162.24418387              -60.55364436             -114.74599364 
 Opp_TeamNameSouth Africa     Opp_TeamNameSri Lanka   Opp_TeamNameWest Indies 
             196.90856999              150.70170068               -6.88997714 
                 TeamRank              Opp_TeamRank  Team_Top10RankingBatsman 
                       NA                        NA                        NA 
 Team_Top50RankingBatsman Team_Top100RankingBatsman   Opp_Top10RankingBatsman 
                       NA                        NA                        NA 
  Opp_Top50RankingBatsman  Opp_Top100RankingBatsman     InningType2nd innings 
                       NA                        NA               24.24029455 
             Runs_OverAll               AVG_Overall                SR_Overall 
              -0.59935875               20.12721378              -13.60151334 
       Runs_Last10Matches         AVG_Last10Matches          SR_Last10Matches 
              -1.92526750                9.24182916                1.23914363 
         Runs_BatingFirst           AVG_BatingFirst            SR_BatingFirst 
               1.41001672               -9.88582744               -6.69780509 
        Runs_BatingSecond          AVG_BatingSecond           SR_BatingSecond 
              -0.90038727               -7.11580086                3.20915976 
        Runs_AgainstTeam2          AVG_AgainstTeam2           SR_AgainstTeam2 
               3.35936312               -5.90267210                2.36899131 

1 个答案:

答案 0 :(得分:0)

您可以看一下这个详细的讨论: predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

通常,多重共线性可导致逻辑回归中的秩不足矩阵。 您可以尝试应用PCA解决多重共线性问题,然后再应用逻辑回归。