Question

我正在使用GridSearchCV以便为我的管道找到最佳参数。

我可以申请的管道似乎运行良好：

pipeline.fit(X_train, y_train)
preds = pipeline.predict(X_test)

我得到了不错的结果。

但是GridSearchCV显然不喜欢某些东西，我无法弄清楚。

我的管道：

feats = FeatureUnion([('age', age),
                      ('education_num', education_num),
                      ('is_education_favo', is_education_favo),
                      ('is_marital_status_favo', is_marital_status_favo),
                      ('hours_per_week', hours_per_week),
                      ('capital_diff', capital_diff),
                      ('sex', sex),
                      ('race', race),
                      ('native_country', native_country)
                     ])

pipeline = Pipeline([
        ('adhocFC',AdHocFeaturesCreation()),
        ('imputers', KnnImputer(target = 'native-country', n_neighbors = 5)),
        ('features',feats),('clf',LogisticRegression())])

我的GridSearch：

hyperparameters = {'imputers__n_neighbors' : [5,21,41], 'clf__C' : [1.0, 2.0]}

GSCV = GridSearchCV(pipeline, hyperparameters, cv=3, scoring = 'roc_auc' , refit = False) #change n_jobs = 2, refit = False

GSCV.fit(X_train, y_train)

我收到11条类似的警告：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ 主要 .py：11： SettingWithCopyWarning：试图在一个副本上设置一个值从DataFrame切片。尝试使用.loc [row_indexer，col_indexer] = 值代替

这是错误消息：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ 主要 .py：11：   SettingWithCopyWarning：试图在一个副本上设置一个值   从DataFrame切片。尝试使用.loc [row_indexer，col_indexer] =   值代替

请参阅文档中的警告：   http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ 主要 .py：12：   SettingWithCopyWarning：试图在一个副本上设置一个值   从DataFrame切片。尝试使用.loc [row_indexer，col_indexer] =   值代替

请参阅文档中的警告：   http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ 主要 .py：14：   SettingWithCopyWarning：试图在一个副本上设置一个值   从DataFrame切片。尝试使用.loc [row_indexer，col_indexer] =   值代替

请参阅文档中的警告：   http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

-------------------------------------------------- ---------------------------- ValueError Traceback（最近的呼叫   最后）在（）         3 GSCV = GridSearchCV（管道，超参数，cv = 3，得分='roc_auc'，改装= False）#change n_jobs = 2，改装= False         4   ----> 5 GSCV.fit（X_train，y_train）

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py   适合（自己，X，y，组）       943个火车/测试仪。       944“”“   -> 945返回self._fit（X，y，groups，ParameterGrid（self.param_grid））       946年       947

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py   在_fit（self，X，y，groups，parameter_iterable）中       562 = true，return_parameters = True，       第563章   -> 564用于parameter_iterable中的参数       565，用于火车，在cv_iter中测试）       566

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py   在通话中（自身，可迭代）       756＃被派遣。特别是这涵盖了边缘       757＃与疲惫的迭代器搭配使用的Parallel。   -> 758，而self.dispatch_one_batch（迭代器）：       第759章       760其他：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py   在dispatch_one_batch中（自己，迭代器）       606返回False       607其他：   -> 608 self._dispatch（任务）       609返回True       610

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py   在_dispatch中（自己，批量）       第569章       570 cb = BatchCompletionCallBack（dispatch_timestamp，len（batch），self）   -> 571作业= self._backend.apply_async（batch，callback = cb）       572 self._jobs.append（job）       573

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py   在apply_async（self，func，callback）中       107 def apply_async（self，func，callback = None）：       108“”“计划要运行的功能”“”   -> 109结果= InstantResult（func）       110，如果回调：       111回调（结果）

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py   在初始化中（自己，批量）       324＃不要延迟应用程序，以避免保持输入       325＃内存中的参数   -> 326 self.results = batch（）       327       328 def get（self）：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py   在通话中（自己）       129       130 def 通话（自己）：   -> 131 return [func（* args，** kwargs）for self.items中的func，args，kwargs]       132       133 def len （自己）：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py   在（.0）中       129       130 def 通话（自己）：   -> 131 return [func（* args，** kwargs）for self.items中的func，args，kwargs]       132       133 def len （自己）：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_validation.py   在_fit_and_score中（估算器，X，y，得分手，训练，测试，详细，   参数，fit_params，return_train_score，return_parameters，   return_n_test_samples，return_times，error_score）       236 estimator.fit（X_train，** fit_params）       237：   -> 238 estimator.fit（X_train，y_train，** fit_params）       239       240，但例外为e：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py   适合（自己，X，y，** fit_params）       266这个估算器       267“”“   -> 268 Xt，fit_params = self._fit（X，y，** fit_params）       269如果self._final_estimator不是None：       270 self._final_estimator.fit（Xt，y，** fit_params）

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py   在_fit（self，X，y，** fit_params）中       232通过       233 elif hasattr（transform，“ fit_transform”）：   -> 234 Xt = transform.fit_transform（Xt，y，** fit_params_steps [name]）       235其他：       236 Xt = transform.fit（Xt，y，** fit_params_steps [name]）\

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/base.py   在fit_transform（self，X，y，** fit_params）中       第495章       496＃Arity 2的拟合方法（监督转换）   -> 497返回self.fit（X，y，** fit_params）.transform（X）       498       499

适合（自己，X，y）        16 self.ohe.fit（X_full）        17＃创建一个不包含任何空值的数据框，类别变量为OHE，每行   ---> 18 X_ohe_full = self.ohe.transform（X_full [〜X [self.col] .isnull（）]。drop（self.col，   轴= 1））        19        20＃将分类器设置在col为null的行上

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py   在 getitem （（自身，密钥）2057中返回   self._getitem_multilevel（key）2058其他：   -> 2059返回self._getitem_column（key）2060 2061 def _getitem_column（self，key）：

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py   in _getitem_column（self，key）2064＃获取列2065
  如果self.columns.is_unique：   -> 2066返回self._get_item_cache（key）2067 2068＃重复的列并可能降低维数

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/generic.py   在_get_item_cache（自身，项目）中1384 res = cache.get（item）
  1385如果res为None：   -> 1386值= self._data.get（项目）1387 res = self._box_item_values（项目，值）1388
  cache [item] = res

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py   在get（自身，项目，快速路径）3550 loc =   indexer.item（）3551其他：   -> 3552提高ValueError（“无法使用空键标记索引”）3553 3554返回self.iget（loc，   fastpath = fastpath）

ValueError：无法使用空键标记索引

Answer 1

我相信没有其他信息，这是因为您的------------------------------------------------- | Local machine, 198.168.0.1 | | | | Dymo webservice runs here. | | | ------------------------------------------------- | | LAN | ------------------------------------------------- | Local machine, 198.168.0.2 | | | | Javascript API is downloaded here and | | 'localhost' is replaced to 198.168.0.1 | | | | After user accesses a webpage and clicks print,| | API tries to contact web service, but times out| | | -------------------------------------------------和X_train变量是pandas数据框，因此基本的sci-kit学习库无法与之匹敌：例如.fit分类器的方法期望像对象这样的数组。

通过输入熊猫数据帧，您无意中像numpy数组一样为它们建立了索引，这在pandas中并不那么稳定。

尝试将训练数据转换为numpy数组：

y_train

工作管道上的GridSearchCV返回ValueError

1 个答案: