我已经将二进制分类器定义为风箱:我用'gbc'方法(Gradient Boosting Classifier)调用它,我在最后一行得到错误min_samples_split must be at least 2 or in (0, 1], got 1
。 featuresClasses是一个数据框,featureLabels是功能列表。
Binary_classifier(method, featureLabels, featuresClasses):
membershipIds = list(set(featuresClasses['membershipId']))
n_membershipIds = len(membershipIds)
index_rand = np.random.permutation(n_membershipIds)
test_size = int(0.3 * n_membershipIds)
membershipIds_test = list(itemgetter(*index_rand[:test_size])(membershipIds))
membershipIds_train = list(itemgetter(*index_rand[test_size+1:])(membershipIds))
data_test = featuresClasses[featuresClasses['membershipId'].isin(membershipIds_test)]
data_train = featuresClasses[featuresClasses['membershipId'].isin(membershipIds_train)]
data_test = data_test[data_test['standing'].isin([0, 1])]
data_train = data_train[data_train['standing'].isin([0, 1])]
X_test = data_test[featureLabels].as_matrix()
y_test = data_test['standing'].values.astype(int)
X_train = data_train[featureLabels].as_matrix()
y_train = data_train['standing'].values.astype(int)
# -------------------------- Run classifier
print 'Binary classification by', method
if method == 'svm':
classifier = svm.SVC(kernel='linear', probability=True)
y_score = classifier.fit(X_train, y_train).decision_function(X_test)
elif method == 'gbc':
params = {'n_estimators': 200, 'max_depth': 3, 'min_samples_split': 1, 'learning_rate': 0.1, 'loss': 'deviance'}
classifier = GradientBoostingClassifier(**params)
y_score = classifier.fit(X_train, y_train).predict(X_test)
答案 0 :(得分:4)
根据GradientBoostingClassifier documentation:
min_samples_split:int,float,optional(default = 2)
The minimum number of samples required to split an internal node: If int, then consider min_samples_split as the minimum number. If float, then min_samples_split is a percentage and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
您在代码中指定了'min_samples_split': 1
。这不是一个有效的案例。它的最小int值是2。
如果你想输入1作为浮点数(即1 *个特征数)(即你想将所有特征都带入min_samples_split
),那么指定为'min_samples_split': 1.0
。如果指定为1,则将其视为int,因此会发生错误。
这是错误的差异,显示为(0,1)而不是(0.0,1.0),这引起了混淆。这也被问及关于scikit-learn的github问题,并已在下一步实施释放: