c50运行+1小时,然后返回名为exit的c50代码,值为1

时间:2017-02-10 16:46:58

标签: r

我查看了有关我的错误的其他问题,但没有一个像我一样有类似的问题。我没有空值,C50包中没有使用数据集中的任何变量名。

这是使用过的数据集的结构(没有空值):

> str(dataset)
'data.frame':   776973 obs. of  13 variables:
$ CrimeID            : int  9446748 9446846 9446876 9447044 9447227 9447263 9447282 9447312 9447340 9447387 ...
$ CaseNumber         : Factor w/ 776907 levels "161884","F218264",..: 67 111 157 283 372 404 421 435 457 487 ...
$ CrimeDate          : Factor w/ 326056 levels "1/1/2014 0:00",..: 1 1 1 1 1 1 1 1 1 1 ...
$ CrimeBlock         : Factor w/ 31381 levels "0000X E 100TH PL",..: 3101 4085 26441 10811 6414 3183 7076 11201 12166 5271 ...
$ IUCR               : Factor w/ 357 levels "031A","031B",..: 345 51 52 333 52 347 347 345 52 334 ...
$ LocationDescription: Factor w/ 135 levels "ABANDONED BUILDING",..: 24 18 122 24 122 122 122 18 122 122 ...
$ Arrest             : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
$ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
$ Beat               : int  1832 1133 1631 1932 1932 1533 1012 1413 1033 1211 ...
$ District           : int  18 11 16 19 19 15 10 14 10 12 ...
$ Ward               : int  42 24 36 43 32 24 24 35 12 26 ...
$ CommunityArea      : int  8 27 17 7 7 25 29 22 30 24 ...
$ FBICode            : Factor w/ 26 levels "01A","01B","04A",..: 24 11 11 24 11 25 25 24 11 24 ...

变量Arrest将在决策树过程中用作目标变量。因此,我将变量分解,将数据集重命名为crimechicago,设置种子以创建随机训练和测试数据集,加载librar c50,并运行c50代码。此代码运行超过一个小时,然后返回错误:c50代码,名为exit,值为1

dataset$Arrest<- factor(dataset$Arrest)
crimechicago <- dataset
set.seed(222) 
totalvalues <-nrow(crimechicago)
train_sample <- sample(totalvalues, 400000) 
crimechicago_train <- crimechicago[train_sample, ] 
crimechicago_test  <- crimechicago[-train_sample, ] 
library(C50)
crimechicago_model <- C5.0(crimechicago_train[-7], crimechicago_train$Arrest)

编辑:

- 从数据集中删除CrimeID和CaseNumber作为目标变量逮捕的有用预测因子

- 数据集的简要截图:(整个数据集,而不是子集)

enter image description here

列车数据集的结构(400,000行,通过随机选择700,000多行700,000行原始数据集创建)

str(crimechicago_train)
'data.frame':   400000 obs. of  10 variables:
 $ CrimeDate          : Factor w/ 326056 levels "1/1/2014 0:00",..: 300760 132223 211541 3 287239 54284 93432 133588 284191 232747 ...
 $ CrimeBlock         : Factor w/ 31381 levels "0000X E 100TH PL",..: 124 14942 2696 24466 143 9024 10613 22404 17613 10766 ...
 $ IUCR               : Factor w/ 357 levels "031A","031B",..: 209 274 25 51 334 345 329 274 347 329 ...
 $ LocationDescription: Factor w/ 135 levels "ABANDONED BUILDING",..: 118 18 80 106 80 110 18 118 122 18 ...
 $ Arrest             : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 1 1 1 1 1 1 1 ...
 $ Domestic           : Factor w/ 2 levels "FALSE","TRUE": 1 2 1 2 1 1 1 2 1 1 ...
 $ Beat               : int  113 1133 1834 825 1834 1434 1921 715 2522 1431 ...
 $ District           : int  1 11 18 8 18 14 19 7 25 14 ...
 $ Ward               : int  42 24 42 15 42 32 47 15 30 1 ...
 $ CommunityArea      : int  32 27 8 66 8 24 5 67 20 22 ...

0 个答案:

没有答案