我正在尝试使用GridSearchCV
和RandomizedSearchCV
来找到两种无监督学习算法(用于新颖性检测)的最佳参数,分别是OneClassSVM
和LocalOutlierFactor
sklearn。
下面是我编写的函数(对此example进行了修改):
def gridsearch(clf, param_dist_rand, param_grid_exhaustive, X):
def report(results, n_top=3):
for i in range(1, n_top + 1):
candidates = np.flatnonzero(results['rank_test_score'] == i)
for candidate in candidates:
print("Model with rank: {0}".format(i))
print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
results['mean_test_score'][candidate],
results['std_test_score'][candidate]))
print("Parameters: {0}".format(results['params'][candidate]))
print("")
n_iter_search = 20
random_search = RandomizedSearchCV(clf,
param_distributions=param_dist_rand, n_iter=n_iter_search, cv=5,
error_score=np.NaN, scoring='accuracy')
start = time()
random_search.fit(X)
print("RandomizedSearchCV took %.2f seconds for %d candidates"
" parameter settings." % ((time() - start), n_iter_search))
report(random_search.cv_results_)
grid_search = GridSearchCV(clf, param_grid=param_grid_exhaustive,
cv=5, error_score=np.NaN, scoring='accuracy')
start = time()
grid_search.fit(X)
print("GridSearchCV took %.2f seconds for %d candidate parameter
settings."
% (time() - start, len(grid_search.cv_results_['params'])))
report(grid_search.cv_results_)
要测试上述功能,我需要以下代码:
X, W = train_test_split(all_data, test_size=0.2, random_state=42)
clf_lof = LocalOutlierFactor(novelty=True, contamination='auto')
lof_param_dist_rand = {'n_neighbors': np.arange(6, 101, 1), 'leaf_size':
np.arange(30, 101, 10)}
lof_param_grid_exhaustive = {'n_neighbors': np.arange(6, 101, 1),
'leaf_size': np.arange(30, 101, 10)}
gridsearch(clf=clf_lof, param_dist_rand=lof_param_dist_rand,
param_grid_exhaustive=lof_param_grid_exhaustive, X=X)
clf_svm = svm.OneClassSVM()
svm_param_dist_rand = {'nu': np.arange(0, 1.1, 0.01), 'kernel': ['rbf',
'linear','poly','sigmoid'], 'degree': np.arange(0, 7,
1), 'gamma': scipy.stats.expon(scale=.1),}
svm_param_grid_exhaustive = {'nu': np.arange(0, 1.1, 0.01), 'kernel':
['rbf', 'linear','poly','sigmoid'], 'degree':
np.arange(0, 7, 1), 'gamma': 0.25}
gridsearch(clf=clf_svm, param_dist_rand=svm_param_dist_rand,
param_grid_exhaustive=svm_param_grid_exhaustive, X=X)
最初,我没有同时为两个scoring
方法都设置GridSearch
参数,但出现了这个错误:
TypeError: If no scoring is specified, the estimator passed should have a 'score' method.
然后我添加了scoring='accuracy'
,因为我想使用测试精度来判断不同模型配置的性能。现在我收到此错误:
TypeError: __call__() missing 1 required positional argument: 'y_true'
我没有标签,因为我有来自一个类的数据,而没有来自计数器类的数据,因此我不知道该如何处理该错误。另外,我查看了此question中的建议,但没有帮助。任何帮助将不胜感激。
编辑:
根据@FChm建议提供的示例数据,请找到示例.csv
数据文件here。文件的简短描述:
由我提供给模型的四列要素(由PCA生成)组成。