我的数据如下:
IntensityRisk Depth Mag smaj smin
<fctr> <int> <int> <int> <int>
1 2 2 3 2 2
2 3 1 3 2 2
3 3 1 3 2 2
4 3 1 1 2 2
5 3 1 1 2 2
6 2 2 3 2 2
7 3 1 3 2 2
8 3 1 3 2 2
9 3 1 3 2 2
10 2 2 3 2 2
我做了以下步骤:
gempaDF <- gempa[order(runif(nrow(gempa))),]
str(gempaDF$IntensityRisk)
tail(gempaDF,5)
gempaTrain <- gempaDF[1:4000,]
gempaTest <- gempaDF[4001:4471,]
C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,1])
并得到这样的错误:
Error in C5.0.default(gempaTrain[, -1], gempaTrain[, 1]) :
C5.0 models require a factor outcome
我已将其更改为:
C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,as.factor(gempaDF$IntensityRisk)])
再次收到错误:
Error: Unsupported index type: factor
然后我尝试将其更改为:
gempaDF <- gempa[order(runif(nrow(gempa))),]
gempaDF$IntensityRisk <- as.factor(gempaDF$IntensityRisk)
str(gempaDF$IntensityRisk)
tail(gempaDF,5)
gempaTrain <- gempaDF[1:4000,]
gempaTest <- gempaDF[4001:4471,]
C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,1])
但仍然会出现这样的错误:
Error in C5.0.default(gempaTrain[, -1], gempaTrain[, 1]) :
C5.0 models require a factor outcome
我也在尝试这个:
C50_model <- C5.0(gempaTrain[,-1], gempaTrain[,gempaDF$IntensityRisk])
但仍然出现错误
Error: Unsupported index type: factor
有谁知道我做错了什么?我非常感激。
答案 0 :(得分:0)
我将使用以下示例数据(因为我无法访问您的数据)
set.seed(1)
dat = tibble::as_tibble(list(IntensityRisk = sample(1:5, 30, replace = T), Depth = sample(1:100, 30, replace = T), Mag = sample(1:100, 30, replace = T)))
table(dat$IntensityRisk)
1 2 3 4 5
4 11 2 7 6
# convert the response to factor,
dat$IntensityRisk = as.factor(dat$IntensityRisk)
str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 30 obs. of 3 variables:
$ IntensityRisk: Factor w/ 5 levels "1","2","3","4",..: 2 2 3 5 2 5 5 4 4 1 ...
$ Depth : int 49 60 50 19 83 67 80 11 73 42 ...
$ Mag : int 92 30 46 34 66 26 48 77 9 88 ...
如果我使用tibble数据帧,我会收到类似的错误,
fit = C50::C5.0(dat1[, -1], dat1[, 1])
Error in C5.0.default(dat[, -1], dat[, 1]) :
C5.0 models require a factor outcome
如果我转换为数据框,
dat1 = as.data.frame(dat)
str(dat1)
'data.frame': 30 obs. of 3 variables:
$ IntensityRisk: Factor w/ 5 levels "1","2","3","4",..: 2 2 3 5 2 5 5 4 4 1 ...
$ Depth : int 49 60 50 19 83 67 80 11 73 42 ...
$ Mag : int 92 30 46 34 66 26 48 77 9 88 ...
该函数无错运行,
fit = C50::C5.0(dat1[, -1], dat1[, 1])
> fit
Call:
C5.0.default(x = dat1[, -1], y = dat1[, 1])
Classification Tree
Number of samples: 30
Number of predictors: 2
Tree size: 8
Non-standard options: attempt to group attributes