无法处理R中超过53个类别的类别预测变量

时间:2018-12-17 14:42:20

标签: r random-forest

即使所有分类变量的级别都没有超过53个,我在运行randomforest时仍遇到以下错误。

> str(combineddata)
'data.frame':   16143 obs. of  13 variables:
 $ X          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Temperature: int  11 11 13 13 14 16 17 17 18 18 ...
 $ P0         : num  700 699 700 699 699 ...
 $ P          : num  764 763 763 763 762 ...
 $ U          : int  54 58 47 47 41 36 34 37 34 34 ...
 $ Ff         : int  5 3 4 4 4 5 4 6 7 7 ...
 $ VV         : num  16 16 16 16 16 16 16 16 16 16 ...
 $ Td         : int  2 3 2 2 1 1 1 2 2 2 ...
 $ T_g_5      : num  12.8 13.4 14.1 14.9 16 17.2 18 19.1 19.9 19.9 ...
 $ s_m_5      : num  0.182 0.184 0.184 0.187 0.185 0.192 0.193 0.19 0.193 0.195 ...
 $ DD         : Factor w/ 17 levels "Calm","Wind blowing from the east",..: 16 16 8 16 8 8 8 8 8 9 ...
 $ datetime   : chr  "2014-09-30 23:00:00" "2014-09-30 22:00:00" "2014-09-30 21:00:00" "2014-09-30 20:00:00" ...
 $ pest       : Factor w/ 8 levels "Bean Leaf Beetle",..: 7 7 7 7 7 7 7 7 7 7 ...
> rfModel <- randomForest(pest ~., data = pest_training)
Error in randomForest.default(m, y, ...) : 
  Can not handle categorical predictors with more than 53 categories.

这是我的数据的最小表示。

> dput(head(combineddata))
structure(list(X = 1:6, Temperature = c(11L, 11L, 13L, 13L, 14L, 
16L), P0 = c(699.6, 699.4, 699.6, 699.4, 699.1, 699), P = c(763.5, 
763.3, 763, 762.8, 762.3, 761.7), U = c(54L, 58L, 47L, 47L, 41L, 
36L), Ff = c(5L, 3L, 4L, 4L, 4L, 5L), VV = c(16, 16, 16, 16, 
16, 16), Td = c(2L, 3L, 2L, 2L, 1L, 1L), T_g_5 = c(12.8, 13.4, 
14.1, 14.9, 16, 17.2), s_m_5 = c(0.182, 0.184, 0.184, 0.187, 
0.185, 0.192), DD = structure(c(16L, 16L, 8L, 16L, 8L, 8L), .Label = c("Calm", 
"Wind blowing from the east", "Wind blowing from the east-northeast", 
"Wind blowing from the east-southeast", "Wind blowing from the north", 
"Wind blowing from the north-east", "Wind blowing from the north-northeast", 
"Wind blowing from the north-northwest", "Wind blowing from the north-west", 
"Wind blowing from the south", "Wind blowing from the south-east", 
"Wind blowing from the south-southeast", "Wind blowing from the south-southwest", 
"Wind blowing from the south-west", "Wind blowing from the west", 
"Wind blowing from the west-northwest", "Wind blowing from the west-southwest"
), class = "factor"), datetime = c("2014-09-30 23:00:00", "2014-09-30 22:00:00", 
"2014-09-30 21:00:00", "2014-09-30 20:00:00", "2014-09-30 19:00:00", 
"2014-09-30 18:00:00"), pest = structure(c(7L, 7L, 7L, 7L, 7L, 
7L), .Label = c("Bean Leaf Beetle", "black cutworm", "Cercospora leaf blight", 
"Corn Earworm", "corn rootworm", "Green Clover Worm", "No threat", 
"Soybean Alphid"), class = "factor")), .Names = c("X", "Temperature", 
"P0", "P", "U", "Ff", "VV", "Td", "T_g_5", "s_m_5", "DD", "datetime", 
"pest"), row.names = c(NA, 6L), class = "data.frame")

我试图寻找解决方案,但无法解决此问题。预先感谢。

0 个答案:

没有答案