我正在尝试限制分析的执行时间,但是我想保留分析已经完成的工作。
在我的情况下,我正在运行xgb.cv
(来自xgboost
R包),我希望保持所有迭代,直到分析达到10秒(或“n”秒/分钟/小时)。
我已经尝试了this thread中提到的方法,但它在达到10秒后停止,而没有保留先前完成的迭代。
这是我的代码:
require(xgboost)
require(R.utils)
data(iris)
train.model <- model.matrix(Sepal.Length~., iris)
dtrain <- xgb.DMatrix(data=train.model, label=iris$Sepal.Length)
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- sqrt(sum((log(preds) - log(labels))^2)/length(labels))
return(list(metric = "error", value = err))}
xgb_grid = list(eta = 0.05, max_depth = 5, subsample = 0.7, gamma = 0.3,
min_child_weight = 1)
fit_boost <- tryCatch(
expr = {evalWithTimeout({xgb.cv(data = dtrain,
nrounds = 10000,
objective = "reg:linear",
eval_metric = evalerror,
early_stopping_rounds = 300,
print_every_n = 100,
params = xgb_grid,
colsample_bytree = 0.7,
nfold = 5,
prediction = TRUE,
maximize = FALSE
)},
timeout = 10)
},
TimeoutException = function(ex) cat("Timeout. Skipping.\n"))
,输出
#Error in dim.xgb.DMatrix(x) : reached CPU time limit
谢谢!
答案 0 :(得分:1)
用R的capture.output()
函数包裹整个事物。这会将所有评估输出存储为R对象。再一次,我认为你正在寻找更多的东西,但这至少是本地的和可塑的。语法:
fit_boost <- capture.output(tryCatch(expr = {evalWithTimeout({...}) ) )
> fit_boost
[1] "[1]\ttrain-error:2.033160+0.006109\ttest-error:2.034180+0.017467 " ...
您还可以使用sink。只需在开始交叉验证之前添加此行:
sink("evaluationLog.txt")
fit_boost <- tryCatch(
expr = {evalWithTimeout({xgb.cv(data = dtrain,
nrounds = 10000,
objective = "reg:linear",
eval_metric = evalerror,
early_stopping_rounds = 300,
print_every_n = 100,
params = xgb_grid,
colsample_bytree = 0.7,
nfold = 5,
prediction = TRUE,
maximize = FALSE
)},
timeout = 10)
},
TimeoutException = function(ex) cat("Timeout. Skipping.\n"))
sink()
最后sink()
通常会将输出返回到控制台,但在这种情况下它不会因为抛出错误而输出。但是一旦你运行它,你可以打开evaluationLog.txt
和中提琴:
[1] train-error:2.033217+0.003705 test-error:2.032427+0.012808
Multiple eval metrics are present. Will use test_error for early stopping.
Will train until test_error hasn't improved in 300 rounds.
[101] train-error:0.045297+0.000396 test-error:0.060047+0.001849
[201] train-error:0.042085+0.000852 test-error:0.059798+0.002382
[301] train-error:0.041117+0.001032 test-error:0.059733+0.002701
[401] train-error:0.040340+0.001170 test-error:0.059481+0.002973
[501] train-error:0.039988+0.001145 test-error:0.059469+0.002929
[601] train-error:0.039698+0.001028 test-error:0.059416+0.003018
当然,这并不完美。我想你想对这些进行一些操作,这不是最好的格式。但是,将其转换为更易于管理的东西并不是一个很高的要求。我还没有找到一种方法来在超时之前保存实际的xgb.cv$evaluation_log
对象。这是一个非常好的问题。