这个问题以前曾被问过,但是并没有以解决我的问题的方式回答。这个问题也略有不同。
我正在尝试使用c5包构建决策树模型。我正在尝试预测MMA战斗机是否具有冠军潜力(这是2级(是/否)的逻辑因素)。
本列最初是一个布尔值,但我使用
将其转换为一个因子fighters_clean$championship_potential <- as.factor(fighters_clean$championship_potential)
table(fighters_clean$championship_potential)
#Rename binary outcome
fighters_clean$championship_potential <- factor(fighters_clean$championship_potential,
levels = c("TRUE", "FALSE"), labels = c("YES", "NO"))
在我的数据帧上显示“具有2个层级的因子”,应作为c5决策树的分类器,但是我不断收到此错误消息。
Error in UseMethod("QuinlanAttributes") :
no applicable method for 'QuinlanAttributes' applied to an object of class "logical"
下面是我的模型的代码。
#Lets use a decision tree to see what fighters have that championship potential
table(fighters_clean$championship_potential)
#FALSE TRUE
#2578 602
#create test and training data
#set seed alters the random number generator so that it is random but repeatable, the number is arbitrary.
set.seed(123)
Tree_training <- sample(3187, 2868)
str(Tree_training)
#So what this does is it creates a vector of 2868 random integers.
#We use this vector to split our data into training and test data
#it should be a representative 90/10 split.
Tree_Train <- fighters_clean[Tree_training, ]
Tree_Test <- fighters_clean[-Tree_training, ]
#That worked, sweet.
#Now lets see if they are representative.
#Should be even number of champ potential in both data sets,
prop.table(table(Tree_Train$championship_potential))
prop.table(table(Tree_Test$championship_potential))
#awesome so thats a perfect split, with each data set having 18% champions.
#C5 is a commercial software for decision tree models that is built into R
#We will use this to build a decision tree.
str(Tree_Train)
'data.frame': 2868 obs. of 12 variables:
$ name : chr "Jesse Juarez" "Milton Vieira" "Joey Gomez" "Gilbert Smith" ...
$ SLpM : num 1.71 1.13 2.93 1.09 5.92 0 0 1.2 0 2.11 ...
$ Str_Acc : num 48 35 35 41 51 0 0 33 0 50 ...
$ SApM : num 2.87 2.36 4.03 2.73 3.6 0 0 1.73 0 1.89 ...
$ Str_Def : num 52 48 53 35 55 0 0 73 0 63 ...
$ TD_Avg : num 2.69 2.67 1.15 3.51 0.44 0 0 0 0 0.19 ...
$ TD_Acc : num 33 53 37 60 33 0 0 0 0 40 ...
$ TD_Def : num 50 12 50 0 70 0 0 50 0 78 ...
$ Sub_Avg : num 0 0.7 0 1.2 0.4 0 0 0 0 0.3 ...
$ Win_percentage : num 0.667 0.565 0.875 0.714 0.8 ...
$ championship_potential: Factor w/ 2 levels "YES","NO": 2 2 1 2 2 2 1 2 2 2 ...
$ contender : logi FALSE FALSE TRUE TRUE TRUE TRUE ...
library(C50)
DTModel <- C5.0(Tree_Train [-11], Tree_Train$championship_potential, trials = 1, costs = NULL)