由于逻辑因素,C5.0算法无法正常工作,解决方案?

时间:2019-04-23 22:44:10

标签: r machine-learning decision-tree c5.0

这个问题以前曾被问过,但是并没有以解决我的问题的方式回答。这个问题也略有不同。

我正在尝试使用c5包构建决策树模型。我正在尝试预测MMA战斗机是否具有冠军潜力(这是2级(是/否)的逻辑因素)。

本列最初是一个布尔值,但我使用

将其转换为一个因子
fighters_clean$championship_potential <- as.factor(fighters_clean$championship_potential)
table(fighters_clean$championship_potential)
#Rename binary outcome
fighters_clean$championship_potential <- factor(fighters_clean$championship_potential, 
                                                levels = c("TRUE", "FALSE"), labels = c("YES", "NO"))

在我的数据帧上显示“具有2个层级的因子”,应作为c5决策树的分类器,但是我不断收到此错误消息。

Error in UseMethod("QuinlanAttributes") : 
  no applicable method for 'QuinlanAttributes' applied to an object of class "logical"

下面是我的模型的代码。

#Lets use a decision tree to see what fighters have that championship potential 
table(fighters_clean$championship_potential)
#FALSE  TRUE 
#2578   602 

#create test and training data 
#set seed alters the random number generator so that it is random but repeatable, the number is arbitrary.
set.seed(123)
Tree_training <- sample(3187, 2868)
str(Tree_training)

#So what this does is it creates a vector of 2868 random integers. 
#We use this vector to split our data into training and test data
#it should be a representative 90/10 split. 

Tree_Train <- fighters_clean[Tree_training, ]
Tree_Test <- fighters_clean[-Tree_training, ]

#That worked, sweet. 
#Now lets see if they are representative. 
#Should be even number of champ potential in both data sets, 

prop.table(table(Tree_Train$championship_potential))
prop.table(table(Tree_Test$championship_potential))

#awesome so thats a perfect split, with each data set having 18% champions. 
#C5 is a commercial software for decision tree models that is built into R
#We will use this to build a decision tree. 
str(Tree_Train)

'data.frame':   2868 obs. of  12 variables:
 $ name                  : chr  "Jesse Juarez" "Milton Vieira" "Joey Gomez" "Gilbert Smith" ...
 $ SLpM                  : num  1.71 1.13 2.93 1.09 5.92 0 0 1.2 0 2.11 ...
 $ Str_Acc               : num  48 35 35 41 51 0 0 33 0 50 ...
 $ SApM                  : num  2.87 2.36 4.03 2.73 3.6 0 0 1.73 0 1.89 ...
 $ Str_Def               : num  52 48 53 35 55 0 0 73 0 63 ...
 $ TD_Avg                : num  2.69 2.67 1.15 3.51 0.44 0 0 0 0 0.19 ...
 $ TD_Acc                : num  33 53 37 60 33 0 0 0 0 40 ...
 $ TD_Def                : num  50 12 50 0 70 0 0 50 0 78 ...
 $ Sub_Avg               : num  0 0.7 0 1.2 0.4 0 0 0 0 0.3 ...
 $ Win_percentage        : num  0.667 0.565 0.875 0.714 0.8 ...
 $ championship_potential: Factor w/ 2 levels "YES","NO": 2 2 1 2 2 2 1 2 2 2 ...
 $ contender             : logi  FALSE FALSE TRUE TRUE TRUE TRUE ...


library(C50)
DTModel <- C5.0(Tree_Train [-11], Tree_Train$championship_potential, trials = 1, costs = NULL)

0 个答案:

没有答案