Question

我在python中创建了以下函数：

def cross_validate(algorithms, data, labels, cv=4, n_jobs=-1):
    print "Cross validation using: "
    for alg, predictors in algorithms:
        print alg
        print
        # Compute the accuracy score for all the cross validation folds. 
        scores = cross_val_score(alg, data, labels, cv=cv, n_jobs=n_jobs)
        # Take the mean of the scores (because we have one for each fold)
        print scores
        print("Cross validation mean score = " + str(scores.mean()))

        name = re.split('\(', str(alg))
        filename = str('%0.5f' %scores.mean()) + "_" + name[0] + ".pkl"
        # We might use this another time 
        joblib.dump(alg, filename, compress=1, cache_size=1e9)  
        filenameL.append(filename)
        try:
            move(filename, "pkl")
        except:
            os.remove(filename) 

        print 
    return

我认为为了进行交叉验证，sklearn必须适合你的功能。

但是，当我稍后尝试使用它时（f是我在joblib.dump(alg, filename, compress=1, cache_size=1e9))上面保存的pkl文件：

alg = joblib.load(f)  
predictions = alg.predict_proba(train_data[predictors]).astype(float)

我在第一行没有错误（因此看起来负载正在运行），但它告诉我NotFittedError: Estimator not fitted, call适合before exploiting the model.在下一行。

我做错了什么？我不能重复使用适合的模型来计算交叉验证吗？我看了Keep the fitted parameters when using a cross_val_score in scikits learn但是我不明白答案，或者它不是我想要的。我想要的是用joblib保存整个模型，以便我以后可以使用它而无需重新拟合。

Answer 1

交叉验证必须适合您的模型并不完全正确;相反，k折交叉验证可以在部分数据集上适合您的模型k次。如果您想要模型本身，您实际上需要在整个数据集上再次拟合模型;这实际上不是交叉验证过程的一部分。所以实际上调用

实际上并不多余

alg.fit(data, labels)

在交叉验证后适合您的模型。

另一个方法是使用专门的函数cross_val_score，而不是使用专用函数GridSearchCV，您可以将其视为交叉验证网格搜索的特殊情况（在参数空间中有一个点）。在这种情况下，refit=True默认会在整个数据集上修改模型（它有一个参数predict），并且在其API中也有predict_proba和module ApplicationHelper def role_link(user) user = User.find(1) case user.rol when 'A' url = 'link 1' when 'C' url = 'link 2' when 'E' url = 'link 3' else url = 'link 4' end end个方法。 / p>

Answer 2

模型不适合的真正原因是函数cross_val_score首先复制模型，然后才能复制副本：Source link

因此您的原始模型尚未安装。

Answer 3

Cross_val_score不保留拟合的模型 Cross_val_predict的确如此没有cross_val_predict_proba但你可以这样做

predict_proba for a cross-validated model

使用joblib重用sklearn中由cross_val_score拟合的模型

3 个答案: