我正在使用R中RandomForest包中的函数RandomForest()
。我希望强制RandomForest()
来组合属性。
假设我们有以下数据框:
> outlook = c("sunny", "sunny", "overcast", "rain", "rain", "rain", "overcast", "sunny" , "sunny", "rain", "sunny", "overcast", "overcast", "rain")
> humidity = c("high", "high", "high", "high", "normal", "normal", "normal", "high", "normal", "normal", "normal", "high", "normal", "high")
> wind = c("weak", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "strong")
> workload = c("high", "high", "moderate", "high", "moderate", "high", "high", "moderate", "moderate", "high", "high", "moderate", "high", "moderate")
> kid_game_night = c("no", "no", "yes", "yes", "no", "no", "no", "yes", "no", "yes", "no", "no", "yes", "no")
> play_tennis = c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
>tennis=data.frame(outlook,humidity,wind,workload,kid_game_night,play_tennis)
> tennis
outlook humidity wind workload kid_game_night play_tennis
1 sunny high weak high no no
2 sunny high strong high no no
3 overcast high weak moderate yes yes
4 rain high weak high yes yes
5 rain normal weak moderate no yes
6 rain normal strong high no no
7 overcast normal strong high no yes
8 sunny high weak moderate yes no
9 sunny normal weak moderate no yes
10 rain normal weak high yes yes
11 sunny normal strong high no yes
12 overcast high strong moderate no yes
13 overcast normal weak high yes yes
14 rain high strong moderate no no
为了使用5个属性outlook,湿度,风,工作量和kid_game_night运行RandomForest()
一种类型:
> tennis.rf = randomForest(x=tennis[,1:5],y=as.factor(tennis$play))
现在假设想要将属性工作负载和kid_game_night组合成一个单独的属性(这个新属性将有4个类别,因为工作负载和kid_game_night都有2个)。因此,不要让RandomForest()
分别考虑分别为3,2,2,2和2类的5个属性,而是考虑4个属性,分别为3,2,2和4类。
我知道我可以通过手动修改输入数据集网球来做到这一点,但这正是我不想做的事情。因此,我的问题是:
有没有办法强制RandomForest组合2个(或更多)属性而不用手动更改输入数据集网球?