错误:C5.0模型需要一个因子结果

时间:2017-10-04 15:28:51

标签: r machine-learning

我的数据如下:

     IntensityRisk Depth  Mag  smaj  smin
           <fctr> <int>  <int> <int> <int>
 1             2     2     3     2     2
 2             3     1     3     2     2
 3             3     1     3     2     2
 4             3     1     1     2     2
 5             3     1     1     2     2
 6             2     2     3     2     2
 7             3     1     3     2     2
 8             3     1     3     2     2
 9             3     1     3     2     2
10             2     2     3     2     2

我做了以下步骤:

gempaDF <- gempa[order(runif(nrow(gempa))),]
str(gempaDF$IntensityRisk)
tail(gempaDF,5)
gempaTrain <- gempaDF[1:4000,]
gempaTest <- gempaDF[4001:4471,]
C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,1])

并得到这样的错误:

Error in C5.0.default(gempaTrain[, -1], gempaTrain[, 1]) : 
  C5.0 models require a factor outcome

我已将其更改为:

C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,as.factor(gempaDF$IntensityRisk)])

再次收到错误:

Error: Unsupported index type: factor

然后我尝试将其更改为:

gempaDF <- gempa[order(runif(nrow(gempa))),]
gempaDF$IntensityRisk <- as.factor(gempaDF$IntensityRisk)
str(gempaDF$IntensityRisk)
tail(gempaDF,5)
gempaTrain <- gempaDF[1:4000,]
gempaTest <- gempaDF[4001:4471,]
C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,1])

但仍然会出现这样的错误:

Error in C5.0.default(gempaTrain[, -1], gempaTrain[, 1]) : 
  C5.0 models require a factor outcome

我也在尝试这个:

C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,gempaDF$IntensityRisk])

但仍然出现错误

Error: Unsupported index type: factor

有谁知道我做错了什么?我非常感激。

1 个答案:

答案 0 :(得分:0)

我将使用以下示例数据(因为我无法访问您的数据)

set.seed(1)
dat = tibble::as_tibble(list(IntensityRisk = sample(1:5, 30, replace = T), Depth = sample(1:100, 30, replace = T), Mag = sample(1:100, 30, replace = T)))

table(dat$IntensityRisk)
 1  2  3  4  5 
 4 11  2  7  6 


# convert the response to factor,

dat$IntensityRisk = as.factor(dat$IntensityRisk)

str(dat)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   30 obs. of  3 variables:
 $ IntensityRisk: Factor w/ 5 levels "1","2","3","4",..: 2 2 3 5 2 5 5 4 4 1 ...
 $ Depth        : int  49 60 50 19 83 67 80 11 73 42 ...
 $ Mag          : int  92 30 46 34 66 26 48 77 9 88 ...

如果我使用tibble数据帧,我会收到类似的错误,

fit = C50::C5.0(dat1[, -1], dat1[, 1])

Error in C5.0.default(dat[, -1], dat[, 1]) : 
  C5.0 models require a factor outcome 

如果我转换为数据框,

dat1 = as.data.frame(dat)

str(dat1)

'data.frame':   30 obs. of  3 variables:
 $ IntensityRisk: Factor w/ 5 levels "1","2","3","4",..: 2 2 3 5 2 5 5 4 4 1 ...
 $ Depth        : int  49 60 50 19 83 67 80 11 73 42 ...
 $ Mag          : int  92 30 46 34 66 26 48 77 9 88 ...

该函数无错运行,

fit = C50::C5.0(dat1[, -1], dat1[, 1])

> fit

Call:
C5.0.default(x = dat1[, -1], y = dat1[, 1])

Classification Tree
Number of samples: 30 
Number of predictors: 2 

Tree size: 8 

Non-standard options: attempt to group attributes