将Keras的指标与sklearn.classification_report的指标进行比较

时间:2017-02-16 16:08:36

标签: python-2.7 scikit-learn keras metrics difference

在评估神经网络时,我正在努力应对不同的指标。 我的调查显示,与sklearn.classification报告相比,Keras(版本1.2.2)计算特定指标的不同值(使用函数评估)。

具体而言,衡量标准的精确度' (即Keras的精确度!=' sklearn的精确度)或者“召回”' (即回想起Keras!='回忆起sklearn)不同。 对于以下工作示例,差异似乎是随机的,但评估更大的网络表明“精确度”。 Keras等于(几乎)召回' sklearn,而两者都回忆起来。指标明显不同。

感谢您的帮助!

from __future__ import print_function 
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils # numpy utils for to_categorical()
from keras import backend as K  # abstract backend API (in order to generate compatible code for Theano and Tf)
from sklearn.metrics import classification_report

batch_size = 128
nb_classes = 10
nb_epoch = 30

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # range [0,1]
X_test /= 255 # range [0,1]
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes) # necessary for use of categorical_crossentropy 
Y_test = np_utils.to_categorical(y_test, nb_classes) # necessary for use of categorical_crossentropy 

# create model
model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

# configure model
model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy', 'precision', 'recall'])

# train model
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test))

# evaluate model with keras
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
print('Test precision:', score[2])
print('Test recall:', score[3])

# evaluate model with sklearn
predictions_last_epoch = model.predict(X_test, batch_size=batch_size, verbose=1)
target_names = ['class 0', 'class 1', 'class 2', 'class 3', 'class 4', 
                    'class 5', 'class 6', 'class 7', 'class 8', 'class 9']

predicted_classes = np.argmax(predictions_last_epoch, axis=1)
print('\n')
print(classification_report(y_test, predicted_classes, 
        target_names=target_names, digits = 6))

E D I T

上面给出的脚本输出:

Test score: 0.0271549037314
Test accuracy: 0.9916
Test precision: 0.992290322304
Test recall: 0.9908


9728/10000 [============================>.] - ETA: 0s

         precision    recall  f1-score   support

class 0   0.987867  0.996939  0.992382       980
class 1   0.993860  0.998238  0.996044      1135
class 2   0.990329  0.992248  0.991288      1032
class 3   0.991115  0.994059  0.992585      1010
class 4   0.994882  0.989817  0.992343       982
class 5   0.991041  0.992152  0.991597       892
class 6   0.993678  0.984342  0.988988       958
class 7   0.992180  0.987354  0.989761      1028
class 8   0.989754  0.991786  0.990769       974
class 9   0.991054  0.988107  0.989578      1009

avg / total   0.991607  0.991600  0.991597     10000

另一种模式:

val/test loss: 0.231304548573
val/test categorical_accuracy: **0.978500002956**
val/test precision: *0.995103668976*
val/test recall: 0.941900001907
val/test fbeta_score: 0.967675107574
val/test mean_squared_error: 0.0064611148566
10000/10000 [==============================] - 0s     


         precision    recall  f1-score   support

class 0   0.989605  0.971429  0.980433       980
class 1   0.985153  0.993833  0.989474      1135
class 2   0.988154  0.969961  0.978973      1032
class 3   0.981373  0.991089  0.986207      1010
class 4   0.968907  0.983707  0.976251       982
class 5   0.997633  0.945067  0.970639       892
class 6   0.995690  0.964509  0.979852       958
class 7   0.987230  0.977626  0.982405      1028
class 8   0.945205  0.991786  0.967936       974
class 9   0.951429  0.990089  0.970374      1009

avg / total   *0.978964*  **0.978500**  0.978522     10000

所需指标的定义(对于model.compile):

metrics=['categorical_accuracy', 'precision', 'recall', 'fbeta_score', 'mean_squared_error']

model.compile(loss='categorical_crossentropy',
            optimizer='sgd',
            metrics=metrics)

model.metrics_names的输出:

['loss', 'categorical_accuracy', 'precision', 'recall', 'fbeta_score', 'mean_squared_error']

1 个答案:

答案 0 :(得分:2)

是的,由于sklearn分类报告根据支持给出加权平均值,因此不同。

试验:

from sklearn.metrics import classification_report
y_true = [0, 1,2,1]
y_pred = [0, 0,2,0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

给你:                  精确召回f1-得分支持

    class 0       0.33      1.00      0.50         1
    class 1       0.00      0.00      0.00         2
    class 2       1.00      1.00      1.00         1

avg / total       0.33      0.50      0.38         **4**

但是,(1 + 0 + 0.33)/ 3 = 0.44(3),但是从支持栏中看来sklearn返回(1 * 1 + 0 * 2 + 0.33 * 1)/4=0.3325