我查看了有关我的错误的其他问题,但没有一个像我一样有类似的问题。我没有空值,C50包中没有使用数据集中的任何变量名。
这是使用过的数据集的结构(没有空值):
> str(dataset)
'data.frame': 776973 obs. of 13 variables:
$ CrimeID : int 9446748 9446846 9446876 9447044 9447227 9447263 9447282 9447312 9447340 9447387 ...
$ CaseNumber : Factor w/ 776907 levels "161884","F218264",..: 67 111 157 283 372 404 421 435 457 487 ...
$ CrimeDate : Factor w/ 326056 levels "1/1/2014 0:00",..: 1 1 1 1 1 1 1 1 1 1 ...
$ CrimeBlock : Factor w/ 31381 levels "0000X E 100TH PL",..: 3101 4085 26441 10811 6414 3183 7076 11201 12166 5271 ...
$ IUCR : Factor w/ 357 levels "031A","031B",..: 345 51 52 333 52 347 347 345 52 334 ...
$ LocationDescription: Factor w/ 135 levels "ABANDONED BUILDING",..: 24 18 122 24 122 122 122 18 122 122 ...
$ Arrest : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ Domestic : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ Beat : int 1832 1133 1631 1932 1932 1533 1012 1413 1033 1211 ...
$ District : int 18 11 16 19 19 15 10 14 10 12 ...
$ Ward : int 42 24 36 43 32 24 24 35 12 26 ...
$ CommunityArea : int 8 27 17 7 7 25 29 22 30 24 ...
$ FBICode : Factor w/ 26 levels "01A","01B","04A",..: 24 11 11 24 11 25 25 24 11 24 ...
变量Arrest将在决策树过程中用作目标变量。因此,我将变量分解,将数据集重命名为crimechicago,设置种子以创建随机训练和测试数据集,加载librar c50,并运行c50代码。此代码运行超过一个小时,然后返回错误:c50代码,名为exit,值为1
dataset$Arrest<- factor(dataset$Arrest)
crimechicago <- dataset
set.seed(222)
totalvalues <-nrow(crimechicago)
train_sample <- sample(totalvalues, 400000)
crimechicago_train <- crimechicago[train_sample, ]
crimechicago_test <- crimechicago[-train_sample, ]
library(C50)
crimechicago_model <- C5.0(crimechicago_train[-7], crimechicago_train$Arrest)
编辑:
- 从数据集中删除CrimeID和CaseNumber作为目标变量逮捕的有用预测因子
- 数据集的简要截图:(整个数据集,而不是子集)
列车数据集的结构(400,000行,通过随机选择700,000多行700,000行原始数据集创建)
str(crimechicago_train)
'data.frame': 400000 obs. of 10 variables:
$ CrimeDate : Factor w/ 326056 levels "1/1/2014 0:00",..: 300760 132223 211541 3 287239 54284 93432 133588 284191 232747 ...
$ CrimeBlock : Factor w/ 31381 levels "0000X E 100TH PL",..: 124 14942 2696 24466 143 9024 10613 22404 17613 10766 ...
$ IUCR : Factor w/ 357 levels "031A","031B",..: 209 274 25 51 334 345 329 274 347 329 ...
$ LocationDescription: Factor w/ 135 levels "ABANDONED BUILDING",..: 118 18 80 106 80 110 18 118 122 18 ...
$ Arrest : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 1 1 1 1 1 1 1 ...
$ Domestic : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 2 1 1 1 2 1 1 ...
$ Beat : int 113 1133 1834 825 1834 1434 1921 715 2522 1431 ...
$ District : int 1 11 18 8 18 14 19 7 25 14 ...
$ Ward : int 42 24 42 15 42 32 47 15 30 1 ...
$ CommunityArea : int 32 27 8 66 8 24 5 67 20 22 ...