p_utils.to_categorical的一个热门编码问题

时间:2018-12-28 14:23:41

标签: python keras

我正在尝试预测图像语料库的类别。

转帐数据的类在列表中,其中包含4个值类型:1、2、4、5

但是p_utils.to_categorical(theList)给了我6个尺寸,而不是4个

您能帮我找到原因吗?

2 个答案:

答案 0 :(得分:2)

根据Keras文档:

  

to_categorical :将类向量(整数)转换为二进制类   矩阵。

     

参数:         y:要转换为矩阵的类向量             (整数从0到num_classes )。

to_categorical的输入参数采用从零开始的整数列表。在您的示例中,列表为[1, 2, 4, 5],该列表已转换为6个不同的类(0至5):

[[0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1.]]

如果需要相同数量的类,则需要在名为LabelEncoder的sklearn包中进行转换。它对值在0到n_classes-1之间的标签进行编码。因此,如果您将列表传递给[1, 2, 4, 5]到LabelEncoder,它将转换为:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
print(le.fit_transform([1, 2, 4, 5]))
>>> [0 1 2 3]

如您所见,标签从零开始,这是我们to_categorical方法所需要的。最后一步是将该列表传递到to_categorical方法中。

new_labels = le.fit_transform([1, 2, 4, 5])
one_hot = to_categorical(new_labels)

>>> [[1. 0. 0. 0.]
    [0. 1. 0. 0.]
    [0. 0. 1. 0.]
    [0. 0. 0. 1.]]

注意OneHotEncoder也存在于sklearn软件包中,但在这种情况下也需要LabelEncoder。

答案 1 :(得分:1)

感谢您的答复。

我已经尝试过此方法(在回答之前),该方法似乎可行(每个时期约为0.997,但第一个时期除外:0.85):

encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
Y = np_utils.to_categorical(encoded_Y)

但是测试语料库的准确性似乎为0:

model = load_model('model.h5')
# predict and evaluate
y_pred = model.predict_classes(X_test)
acc = np.sum(y_pred == Y_test) / np.size(y_pred)
print("Test accuracy = {}".format(acc))

您能帮我找到原因吗?

模型的val_acc很好,我不明白为什么精度测试这么差

EDIT:predict_classes返回如下类:set([0,1,2,3,4,5,6,7,8]),但我计算出以下精度:set([1,2,4,5 ,9,12,14,19,20])

如何告诉keras正确编码?

编辑:问题已解决:

encoder = LabelEncoder()
encoder.fit(Y_test)
Y_test = encoder.fit_transform(Y_test) here

我得到99.0574712644%的准确度

训练10584个样本,验证2646个样本 时代1/30 10584/10584 [==============================]-315s 30ms / step-损耗:1.4155-acc:0.4357-val_loss :0.5511-val_acc:0.7827 时代2/30  5152/10584 [============> ................]-ETA:2:16-损失:0.5317-acc:0.7999 10584/10584 [==============================]-308s 29ms / step-损耗:0.3907-acc:0.8553-val_loss :0.1589-val_acc:0.9471 时代3/30 10584/10584 [==============================]-284s 27ms / step-损失:0.1351-acc:0.9552-val_loss :0.0342-val_acc:0.9872 时代4/30 10584/10584 [==============================]-286s 27ms / step-损耗:0.0826-acc:0.9740-val_loss :0.0318-val_acc:0.9902 时代5/30 10584/10584 [==============================]-284s 27ms / step-损失:0.0660-acc:0.9797-val_loss :0.0354-val_acc:0.9875 时代6/30 10584/10584 [==============================]-284s 27ms / step-损失:0.0386-acc:0.9880-val_loss :0.0244-val_acc:0.9913 时代7/30 10584/10584 [==============================]-284s 27ms / step-损失:0.0370-acc:0.9869-val_loss :0.0171-val_acc:0.9943 时代8/30 10584/10584 [==============================]-283s 27ms / step-损耗:0.0300-acc:0.9906-val_loss :0.0165-val_acc:0.9928 时代9/30 10584/10584 [==============================]-283s 27ms / step-损耗:0.0355-acc:0.9880-val_loss :0.0196-val_acc:0.9940 时代10/30 10584/10584 [==============================]-284s 27ms / step-损失:0.0227-acc:0.9930-val_loss :0.0162-val_acc:0.9958 时代11/30 10584/10584 [==============================]-287s 27ms / step-损耗:0.0103-acc:0.9966-val_loss :0.0093-val_acc:0.9974 时代12/30 10584/10584 [==============================]-293s 28ms / step-损耗:0.0091-acc:0.9976-val_loss :0.0098-val_acc:0.9962 时代13/30 10584/10584 [==============================]-284s 27ms / step-损失:0.0069-acc:0.9980-val_loss :0.0113-val_acc:0.9962 时代14/30 10584/10584 [==============================]-283s 27ms / step-损失:0.0055-acc:0.9980-val_loss :0.0079-val_acc:0.9974 时代15/30 10584/10584 [==============================]-307s 29ms / step-损失:0.0046-acc:0.9986-val_loss :0.0077-val_acc:0.9974 时代16/30 10584/10584 [==============================]-311s 29ms / step-损耗:0.0042-acc:0.9987-val_loss :0.0081-val_acc:0.9974 时代17/30 10584/10584 [==============================]-327s 31ms / step-损失:0.0034-acc:0.9992-val_loss :0.0085-val_acc:0.9974 时代18/30 10584/10584 [==============================]-311s 29ms / step-损失:0.0046-acc:0.9982-val_loss :0.0088-val_acc:0.9974 时代19/30 10584/10584 [==============================]-310s 29ms / step-损失:0.0036-acc:0.9992-val_loss :0.0070-val_acc:0.9977 时代20/30 10584/10584 [==============================]-299s 28ms / step-损耗:0.0032-acc:0.9989-val_loss :0.0082-val_acc:0.9970 时代21/30 10584/10584 [==============================]-309s 29ms / step-损耗:0.0040-acc:0.9989-val_loss :0.0083-val_acc:0.9977 时代22/30 10584/10584 [==============================]-298s 28ms / step-损耗:0.0037-acc:0.9986-val_loss :0.0080-val_acc:0.9974 时代23/30 10584/10584 [==============================]-295s 28ms / step-损耗:0.0041-acc:0.9987-val_loss :0.0082-val_acc:0.9974 时代24/30 10584/10584 [==============================]-291s 28ms / step-损失:0.0026-acc:0.9993-val_loss :0.0082-val_acc:0.9974 时代25/30  2240/10584 [=====> ........................]-预计到达时间:3:19-los 2272/10584 [=== ==> ........................]-预计:3:19-los 2304/10584 [=====> .... ....................]-ETA:3:20-损失:0.0 2400/10584 [=====> ......... ...............]-ETA:3:21 2432/10584 [=====> .................. ......]-预计到达时间:3:21-los 2464/10584 [=====> ................ 10584/10584 [===== =========================]-291s 27ms / step-损耗:0.0024-acc:0.9993-val_loss:0.0079-val_acc:0.9977 时代26/30 10584/10584 [==============================]-287s 27ms / step-损耗:0.0031-acc:0.9989-val_loss :0.0077-val_acc:0.9977 时代27/30 10584/10584 [==============================]-284s 27ms / step-损失:0.0024-acc:0.9992-val_loss :0.0080-val_acc:0.9974 时代28/30 10584/10584 [==============================]-283s 27ms / step-损失:0.0030-acc:0.9986-val_loss :0.0078-val_acc:0.9977 时代29/30 10584/10584 [==============================]-284s 27ms / step-损失:0.0027-acc:0.9991-val_loss :0.0076-val_acc:0.9977 时代30/30 10584/10584 [==============================]-282s 27ms / step-损耗:0.0033-acc:0.9988-val_loss :0.0074-val_acc:0.9974 预测

测试精度= 0