即使所有分类变量的级别都没有超过53个,我在运行randomforest时仍遇到以下错误。
> str(combineddata)
'data.frame': 16143 obs. of 13 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ Temperature: int 11 11 13 13 14 16 17 17 18 18 ...
$ P0 : num 700 699 700 699 699 ...
$ P : num 764 763 763 763 762 ...
$ U : int 54 58 47 47 41 36 34 37 34 34 ...
$ Ff : int 5 3 4 4 4 5 4 6 7 7 ...
$ VV : num 16 16 16 16 16 16 16 16 16 16 ...
$ Td : int 2 3 2 2 1 1 1 2 2 2 ...
$ T_g_5 : num 12.8 13.4 14.1 14.9 16 17.2 18 19.1 19.9 19.9 ...
$ s_m_5 : num 0.182 0.184 0.184 0.187 0.185 0.192 0.193 0.19 0.193 0.195 ...
$ DD : Factor w/ 17 levels "Calm","Wind blowing from the east",..: 16 16 8 16 8 8 8 8 8 9 ...
$ datetime : chr "2014-09-30 23:00:00" "2014-09-30 22:00:00" "2014-09-30 21:00:00" "2014-09-30 20:00:00" ...
$ pest : Factor w/ 8 levels "Bean Leaf Beetle",..: 7 7 7 7 7 7 7 7 7 7 ...
> rfModel <- randomForest(pest ~., data = pest_training)
Error in randomForest.default(m, y, ...) :
Can not handle categorical predictors with more than 53 categories.
这是我的数据的最小表示。
> dput(head(combineddata))
structure(list(X = 1:6, Temperature = c(11L, 11L, 13L, 13L, 14L,
16L), P0 = c(699.6, 699.4, 699.6, 699.4, 699.1, 699), P = c(763.5,
763.3, 763, 762.8, 762.3, 761.7), U = c(54L, 58L, 47L, 47L, 41L,
36L), Ff = c(5L, 3L, 4L, 4L, 4L, 5L), VV = c(16, 16, 16, 16,
16, 16), Td = c(2L, 3L, 2L, 2L, 1L, 1L), T_g_5 = c(12.8, 13.4,
14.1, 14.9, 16, 17.2), s_m_5 = c(0.182, 0.184, 0.184, 0.187,
0.185, 0.192), DD = structure(c(16L, 16L, 8L, 16L, 8L, 8L), .Label = c("Calm",
"Wind blowing from the east", "Wind blowing from the east-northeast",
"Wind blowing from the east-southeast", "Wind blowing from the north",
"Wind blowing from the north-east", "Wind blowing from the north-northeast",
"Wind blowing from the north-northwest", "Wind blowing from the north-west",
"Wind blowing from the south", "Wind blowing from the south-east",
"Wind blowing from the south-southeast", "Wind blowing from the south-southwest",
"Wind blowing from the south-west", "Wind blowing from the west",
"Wind blowing from the west-northwest", "Wind blowing from the west-southwest"
), class = "factor"), datetime = c("2014-09-30 23:00:00", "2014-09-30 22:00:00",
"2014-09-30 21:00:00", "2014-09-30 20:00:00", "2014-09-30 19:00:00",
"2014-09-30 18:00:00"), pest = structure(c(7L, 7L, 7L, 7L, 7L,
7L), .Label = c("Bean Leaf Beetle", "black cutworm", "Cercospora leaf blight",
"Corn Earworm", "corn rootworm", "Green Clover Worm", "No threat",
"Soybean Alphid"), class = "factor")), .Names = c("X", "Temperature",
"P0", "P", "U", "Ff", "VV", "Td", "T_g_5", "s_m_5", "DD", "datetime",
"pest"), row.names = c(NA, 6L), class = "data.frame")
我试图寻找解决方案,但无法解决此问题。预先感谢。