我收到以下代码的错误:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.neighbors import KernelDensity
from sklearn.decomposition import PCA
from sklearn.grid_search import GridSearchCV
from sklearn import linear_model, mixture, decomposition, datasets
# load the data
digits = load_digits()
data = digits.data
pca = PCA(n_components=15, whiten=False)
data = pca.fit_transform(digits.data)
gmm = mixture.GMM()
# use grid search cross-validation
params = {'gmm__n_components':(2, 3)}
grid = GridSearchCV(gmm, params)
grid.fit(data)
ERROR:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-07b1b825ee22> in <module>()
22
23 grid = GridSearchCV(gmm, params)
---> 24 grid.fit(data)
25
C:\Anaconda2\lib\site-packages\sklearn\grid_search.pyc in fit(self, X, y)
802
803 """
--> 804 return self._fit(X, y, ParameterGrid(self.param_grid))
805
806
C:\Anaconda2\lib\site-packages\sklearn\grid_search.pyc in _fit(self, X, y, parameter_iterable)
551 self.fit_params, return_parameters=True,
552 error_score=self.error_score)
--> 553 for parameters in parameter_iterable
554 for train, test in cv)
555
C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self, iterable)
802 self._iterating = True
803
--> 804 while self.dispatch_one_batch(iterator):
805 pass
806
C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in dispatch_one_batch(self, iterator)
660 return False
661 else:
--> 662 self._dispatch(tasks)
663 return True
664
C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in _dispatch(self, batch)
568
569 if self._pool is None:
--> 570 job = ImmediateComputeBatch(batch)
571 self._jobs.append(job)
572 self.n_dispatched_batches += 1
C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __init__(self, batch)
181 # Don't delay the application, to avoid keeping the input
182 # arguments in memory
--> 183 self.results = batch()
184
185 def get(self):
C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self)
70
71 def __call__(self):
---> 72 return [func(*args, **kwargs) for func, args, kwargs in self.items]
73
74 def __len__(self):
C:\Anaconda2\lib\site-packages\sklearn\cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
1518
1519 if parameters is not None:
-> 1520 estimator.set_params(**parameters)
1521
1522 start_time = time.time()
C:\Anaconda2\lib\site-packages\sklearn\base.pyc in set_params(self, **params)
259 'Check the list of available parameters '
260 'with `estimator.get_params().keys()`.' %
--> 261 (name, self))
262 sub_object = valid_params[name]
263 sub_object.set_params(**{sub_name: value})
ValueError: Invalid parameter gmm for estimator GMM(covariance_type='diag', init_params='wmc', min_covar=0.001,
n_components=1, n_init=1, n_iter=100, params='wmc', random_state=None,
thresh=None, tol=0.001, verbose=0). Check the list of available parameters with `estimator.get_params().keys()`.
虽然我发现Scikit-Learn上的类似代码工作正常,但请参阅下面的代码,但上面的代码给我的错误唯一的区别是算法,这会有所作为吗?我该如何解决这个问题? 感谢。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.neighbors import KernelDensity
from sklearn.decomposition import PCA
from sklearn.grid_search import GridSearchCV
# load the data
digits = load_digits()
data = digits.data
# project the 64-dimensional data to a lower dimension
pca = PCA(n_components=15, whiten=False)
data = pca.fit_transform(digits.data)
# use grid search cross-validation to optimize the bandwidth
params = {'bandwidth': np.logspace(-1, 1, 20)}
grid = GridSearchCV(KernelDensity(), params)
grid.fit(data)
print("best bandwidth: {0}".format(grid.best_estimator_.bandwidth))
答案 0 :(得分:0)
我发现您的代码存在两个问题。
首先,因为您只是将单个估算器传递给GridSearchCV,所以不应在参数网格中的参数名称的开头包含gmm__
。删除它会让您超越上面引用的错误。您可以按如下方式更改参数网格分配:
params = {'n_components':(2, 3)}
但是一旦你遇到这个错误,你会发现你遇到了第二个问题。 GMM.score()
返回一个数组,而不是一个得分值。从这个意义上讲,它与sklearn对KMeans,KernelDensity,PCA等的操作不同(请参阅此问题的讨论:https://github.com/scikit-learn/scikit-learn/issues/2473)。 GMM的得分数组会导致GridSearchCV抛出错误,因为它需要单个值。您从sklearn的网站提供的示例使用KernelDensity,因此不会出现此类问题。
我建议使用另一种算法,该算法具有与GridSearchCV的预期相符的分数函数,例如KMeans或KernelDensity。或者,您可以为要测试的每个n_component
级别单独运行gmm.fit(),并以最适合您的方式比较结果。