R插入高实例培训

时间:2016-08-10 08:30:34

标签: r machine-learning r-caret bigdata

我有一个数据集,我需要在R中用一些ML算法进行分析。 这是我的代码:

#data is a big db (351444 obj and 39 variables). It contains physiological data about different users. 
library('mlbench')
library('randomForest')
library('nnet')
library("kernlab")
library('caret')
library('rpart')
library('caretEnsemble')
library('pROC')
data$target <- as.factor(emodata$target) #i need to do a classification

set.seed(12)
trainIndex <- createDataPartition(data$target, p = .8,
                                  list = FALSE,
                                  times = 1)
Train <- data[ trainIndex,]
Test  <- data[-trainIndex,]

my_control <- trainControl(
  method='cv',
  number=5,
  savePredictions=TRUE,
  classProbs=TRUE,
  verboseIter=TRUE,
  allowParallel=TRUE,
  index=createResample(Train$target),
  summaryFunction=twoClassSummary
)

model_list <- caretList(
  target~., data=Train,
  trControl=my_control,
  metric='ROC',
  # methodList=c('glm', 'rpart'),
  tuneList=list(
    svmLinear=caretModelSpec(method='svmLinear',tuneGrid=expand.grid(.C=c(0.01,0.1,1,3,5,10,20))),
    svmPoly=caretModelSpec(method='svmPoly',tuneGrid=expand.grid(.degree=(2:5), .scale=.1, .C=c(0.01,0.1,1,3,5,10,20))),
    svmRadial=caretModelSpec(method='svmRadial',tuneGrid=expand.grid(.sigma=.1,.C=c(0.01,0.1,1,3,5,10,20))),
    rf=caretModelSpec(method='rf'),
    nn=caretModelSpec(method='nnet',tuneGrid=expand.grid(.size=(1:15),.decay=c(0.01,0.1,1,3,5,10,20)))
  )
)

当我启动此代码时,一切似乎都运行正常。无论如何,经过一段时间后,model_list代码返回内存分配错误:

    Resample01: C= 0.01 
model fit failed for Resample01: C= 0.01 Error : cannot allocate vector of size 427.1 Gb

- Resample01: C= 0.01 
+ Resample01: C= 0.10 
model fit failed for Resample01: C= 0.10 Error : cannot allocate vector of size 427.1 Gb

有点&#39;我预料到这个错误。但是,我不知道如何解决它。我不知道如何减少线条(它们是用户的生理测量值)。

你能帮助我吗?

此致

0 个答案:

没有答案