使用GridSearchCV为具有多类输出的NN进行自定义评分时出现KerasClassifier问题

时间:2018-03-04 11:07:40

标签: scikit-learn keras cross-validation grid-search scoring

使用来自Keras模型的多类输出的自定义评分为cross_val_score或GridSearchCV返回相同的错误,如下所示(它在Iris上,因此您可以直接运行它来测试):

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
Y = to_categorical(iris.target)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                    scoring=['accuracy','precision_macro','recall_macro'],
                    refit='precision_macro')

grid_results = grid.fit(X_train,Y_train)

所以我收到此错误

我绕过了整个堆栈,因为你可以通过复制上面的代码来重现它。

ValueError: Classification metrics can't handle a mix of multilabel-indicator and binary targets

当我删除评分参数时,它可以正常工作。

有没有办法避免这种情况并启用f1,精度或任何自定义分数?当然,无需重写我自己的网格搜索代码。

感谢您的帮助

更新:我刚刚找到了解决方法

首先,这个doc(http://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification-format)表明Keras中使用的单热表示在scikit-learn中被解释为 multilabel

然后查看实现KerasClassifier类的scikit_learn.pyhttps://github.com/keras-team/keras/blob/master/keras/wrappers/scikit_learn.py

BaseWrapper类中的fit函数包含以下代码行:

if loss_name == 'categorical_crossentropy' and len(y.shape) != 2:
            y = to_categorical(y)

Wrapper自己进行分类转换。

为了避免这个问题,Keras似乎由于多类表示与scikit-learn的区别,可以采用scikit-learn风格的多类[0,1,2,1,0,2]并将其转换为仅用于NN模型的分类表示适合。

因此,我只是尝试在将模型传递给sklearn函数时删除分类转换。

现在可以使用

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier

iris = datasets.load_iris()
X= iris.data
#Y = to_categorical(iris.target,3)
Y = iris.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)

def create_model(optimizer='rmsprop'):
    model = Sequential()
    model.add(Dense(8,activation='relu',input_shape = (4,)))
    model.add(Dense(3,activation='softmax'))
    model.compile(optimizer = optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


model = KerasClassifier(build_fn=create_model,
                        epochs=10, 
                        batch_size=5,
                        verbose=0)

#results = cross_val_score(model, X_train, Y_train, scoring='precision_macro')

param_grid = {'optimizer':('rmsprop','adam')}
grid = GridSearchCV(model,
                    param_grid=param_grid,
                    return_train_score=True,
                   scoring=['precision_macro','recall_macro','f1_macro'],
                    refit='precision_macro')
grid_results = grid.fit(X_train,Y_train)

0 个答案:

没有答案