用于情感分析数据集的随机森林的gridsearchcv

时间:2018-03-25 14:54:46

标签: python-3.x random-forest grid-search

我正在调整随机森林以获得不同的结果。我使用gridsearchcv为svms获得了不同的结果,但是在为随机林获取相同类型的结果时遇到了问题。当我处理模型时,我得到了以下错误。

#To Create a Validation Dataset
# Split-out validation dataset
X = df.ix[:,1:18] #training define
Y = df.ix[:,0]  #class define
validation_size = 0.20
#seed = 7
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=validation_size, random_state=0)
# Test options and evaluation metric
num_folds = 10
num_instances = len(X_train)
scoring = 'accuracy'

我处理了以下代码来设置参数。当我在情绪分析数据集上运行此过程时,请解决我的问题。数据集采用csv格式。

tuned_parameters = [{RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=2, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=0, verbose=0, warm_start=False)}]
X, Y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X, Y)

scores = ['precision', 'recall']

for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    print()

    clf = GridSearchCV(clf, tuned_parameters, cv=10,
                       scoring='%s_macro' % score)
    clf.fit(X_train, Y_train)

    print("Best parameters set found on development set:")
    print()
    print(clf.best_params_)
    print()
    print("Grid scores on development set:")
    print()
    means = clf.cv_results_['mean_test_score']
    stds = clf.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    print()

    print("Detailed classification report:")
    print()
  #  print("The model is trained on the full development set.")
  #  print("The scores are computed on the full evaluation set.")
    print()
    y_true, y_pred = Y_test ,  clf.predict(X_test)
    print(classification_report(y_true, y_pred))
    print()

通过交叉验证设置参数

using UnityEngine;

public class CubeRotation : MonoBehaviour {

    public GameObject Platform;
    Quaternion PreviousPlatformRotation;
    public float rotationSpeed = 50;

    private void Start() {
        PreviousPlatformRotation = Platform.transform.rotation;
    }

    private void Update() {
        //Rotate the cube by input
        if (Input.GetKey(KeyCode.A)) {
            transform.Rotate(Vector3.up, Time.deltaTime * rotationSpeed);
        }
        if (Input.GetKey(KeyCode.D)) {
            transform.Rotate(Vector3.up, -Time.deltaTime * rotationSpeed);
        }

        //Adjust rotation due to platform rotating
        if (Platform.transform.rotation != PreviousPlatformRotation) {
            var platformRotatedBy = Platform.transform.rotation * Quaternion.Inverse(PreviousPlatformRotation);
            transform.rotation *= platformRotatedBy;
            PreviousPlatformRotation = Platform.transform.rotation;
        }
    }
}

1 个答案:

答案 0 :(得分:-1)

您是否尝试在代码中首先设置随机种子? RF使用随机种子,每次都会有一些差异。

np.random.seed(0)

我的猜测添加上面的行会使你的代码重现。