Question

我正在尝试在R中的插入符包中创建预测模型，并从terminal / cmd调用新数据的预测。这是可重复的例子：

# Sonar_training.R
  ## learning and saving model
library(caret)
library(mlbench)
data(Sonar)
set.seed(107)
inTrain <- createDataPartition(y = Sonar$Class, p = .75,list = FALSE)
training <- Sonar[ inTrain,]
testing  <- Sonar[-inTrain,]
saveRDS(testing,"test.rds")
ctrl <- trainControl(method = "repeatedcv",
                 repeats = 3)
plsFit <- train(Class ~ .,data = training,method = "pls",
            tuneLength = 15,
            trControl = ctrl,
            preProc = c("center", "scale"))

plsClasses <- predict(plsFit, newdata = testing)

saveRDS(plsFit,"fit.rds")

这是由Rscript.exe调用的脚本：

# script.R
  ##reading model and predict test data
t <- Sys.time()
pls <- readRDS("fit.rds")
testing <- readRDS("test.rds")
head(predict(pls, newdata = testing))
print(Sys.time() - t)

我在终端中使用以下语句运行：

pawel@pawel-MS-1753:~$ Rscript script.R
Loading required package: pls

Attaching package: ‘pls’

The following object is masked from ‘package:stats’:

loadings

[1] M M R M R R
Levels: M R
Time difference of 2.209697 secs

有没有办法更快/更有效率？例如，是否有可能在每次执行时都不加载包？在这种情况下， readRDS 是否正确读取模型？

Answer 1

您可以尝试使用“profvis”包来分析您的代码：

#library(profvis)
profvis({    

   for (i in 1:100){
    #your code here
    }

})

我尝试过，99％的执行时间是培训时间，1％是保存/加载RDS数据，其余成本约为0（加载包，加载数据......）：

因此，如果您不想自己优化训练功能，那么您似乎很少有办法缩短执行时间。

Answer 2

我已经看到这种情况发生在PLS分类模型中，而且我不确定这个问题。但是，请尝试使用AType get(int n) { Links<AType> current = list; while (n > 0 && current instanceof Cons) { current = current.getNext(); n--; } return current.getElem(); }。你会得到大致相同的答案，它应该很快完成。

Rscript - 执行时间长

2 个答案: