Question

我在一些简单的流失数据上测试了一套分类器。列是计划的成本，事件的计数，然后是国家的稀疏热编码（其他，美国，未知）

    $       Count   Other   US  Unknown
0   13.99   0       0       0   1
1   13.99   391     0       0   1
2   35.00   2       0       1   0
3   13.99   0       0       0   1
4   13.99   0       0       0   1

将此数据提供给我的代码适用于Gradient Boosting分类器：

estimator = GradientBoostingClassifier()
param = {"estimator__loss": ["deviance"],
       "estimator__n_estimators": [10, 50, 100],
       "estimator__max_depth": [2, 3, 4, 5],
       "estimator__max_features": ["auto","log2"]}
selector = RFECV(estimator, step=1, cv=3, scoring="roc_auc")
clf = grid_search.GridSearchCV(selector, param, cv=3, n_jobs=-1)
clf.fit(X,y)
print clf.best_estimator_.estimator_
print np.mean(clf.best_estimator_.grid_scores_)
print clf.best_estimator_.ranking_
print("Took %.2f seconds" % (time() - start))

但是，如果我使用AdaBoost分类器运行相同的代码：

start = time()
estimator = AdaBoostClassifier(DecisionTreeClassifier())
param = {"estimator__n_estimators": [10, 50, 100]}
selector = RFECV(estimator, step=1, cv=3, scoring="roc_auc")
clf = grid_search.GridSearchCV(selector, param, cv=3, n_jobs=-1)
clf.fit(X,y)
print clf.best_estimator_.estimator_
print np.mean(clf.best_estimator_.grid_scores_)
print clf.best_estimator_.ranking_
print("Took %.2f seconds" % (time() - start))

我得到ValueError：输入包含NaN，无穷大或对于dtype来说太大的值（＆＃39; float64＆＃39;）。我使用以下方法检查了NaN和无穷大：

print np.isnan(X).any() #False
print np.isfinite(X).any() #True

我还检查了AdaBoostClassifier的文档，看看我的输入值是否过大; Count列中的最大值小于4000。我不确定我的下一步应该是什么来解决这个错误;非常感谢任何帮助！

Sklearn AdaBoostClassifier在与GradientBoostingClassifier相同的数据上出错

0 个答案: