将2个属性与R中的RandomForest()组合在一起

时间:2016-03-17 19:32:51

标签: r machine-learning classification random-forest

我正在使用R中RandomForest包中的函数RandomForest()。我希望强制RandomForest()来组合属性。

假设我们有以下数据框:

> outlook = c("sunny", "sunny", "overcast", "rain", "rain", "rain", "overcast", "sunny" , "sunny", "rain", "sunny", "overcast", "overcast", "rain")
> humidity = c("high", "high", "high", "high", "normal", "normal", "normal", "high", "normal", "normal", "normal", "high", "normal", "high")
> wind = c("weak", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "weak", "weak", "strong", "strong", "weak", "strong")
> workload = c("high", "high", "moderate", "high", "moderate", "high", "high", "moderate", "moderate", "high", "high", "moderate", "high", "moderate")
> kid_game_night = c("no", "no", "yes", "yes", "no", "no", "no", "yes", "no", "yes", "no", "no", "yes", "no")
> play_tennis = c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no") 
>tennis=data.frame(outlook,humidity,wind,workload,kid_game_night,play_tennis)
> tennis
    outlook humidity   wind workload kid_game_night play_tennis
1     sunny     high   weak     high             no          no
2     sunny     high strong     high             no          no
3  overcast     high   weak moderate            yes         yes
4      rain     high   weak     high            yes         yes
5      rain   normal   weak moderate             no         yes
6      rain   normal strong     high             no          no
7  overcast   normal strong     high             no         yes
8     sunny     high   weak moderate            yes          no
9     sunny   normal   weak moderate             no         yes
10     rain   normal   weak     high            yes         yes
11    sunny   normal strong     high             no         yes
12 overcast     high strong moderate             no         yes
13 overcast   normal   weak     high            yes         yes
14     rain     high strong moderate             no          no

为了使用5个属性outlook,湿度,风,工作量和kid_game_night运行RandomForest()一种类型:
  > tennis.rf = randomForest(x=tennis[,1:5],y=as.factor(tennis$play))

现在假设想要将属性工作负载和kid_game_night组合成一个单独的属性(这个新属性将有4个类别,因为工作负载和kid_game_night都有2个)。因此,不要让RandomForest()分别考虑分别为3,2,2,2和2类的5个属性,而是考虑4个属性,分别为3,2,2和4类。
我知道我可以通过手动修改输入数据集网球来做到这一点,但这正是我不想做的事情。因此,我的问题是:
 有没有办法强制RandomForest组合2个(或更多)属性而不用手动更改输入数据集网球?

0 个答案:

没有答案