使用随机森林分类器调整超参数

时间:2020-10-21 10:36:26

标签: python machine-learning random-forest

嗨,我正在尝试微调数据。不知道我是否做对了?我正在使用kaggle信用卡数据。但是,发生错误[Parallel(n_jobs = -1)]:将后端LokyBackend与4个并发工作程序一起使用。是什么意思?

        data = pd.read_csv('creditcard.csv')

        # setting up testing and training sets
        X = data.drop('Class', axis=1)
        Y = data['Class']
        X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.30, random_state=0)
                                                
        from sklearn.ensemble import RandomForestClassifier
        model = RandomForestClassifier(n_estimators = 10)
        model.fit(X_train,Y_train)
        # predictions
        y_pred = model.predict(X_test)   

        #Tuning Hyperparameters
        from sklearn.model_selection import RandomizedSearchCV
        random_search = {'criterion': ['entropy', 'gini'],
           'max_depth': list(np.linspace(10, 1200, 10, dtype = int)) + [None],
           'max_features': ['auto', 'sqrt','log2', None],
           'min_samples_leaf': [4, 6, 8, 12],
           'min_samples_split': [5, 7, 10, 14],
           'n_estimators': list(np.linspace(151, 1200, 10, dtype = int))}

       model = RandomForestClassifier()
       model_random = RandomizedSearchCV(estimator = model, param_distributions = random_search, n_iter = 100, 
                           cv = 3, verbose= 2, random_state= 1, n_jobs = -1)
       model_random.fit(X_train,Y_train)
       print(model_random.best_params_)
       print(model_random.best_score_)

0 个答案:

没有答案