Question

我有一个使用CNN对图像进行分类的应用程序。我正在考虑在64 x 64图像块的输入大小上使用Mobilenet，Resnet和Densenet。为了对图像进行分类，我将其类别定义为对图像块进行分类时最常出现的类别。这个问题非常不平衡，我有很多积极的样本而不是消极的样本。我正在考虑三个数据集。

为解决这个问题，我首先计算了指标，例如f量度，归一化精度等。以下是考虑了三个CNN的某些数据集的标准化准确性结果：

要构建ROC曲线，我决定将图像的分数定义为图像块的平均分数，因此这就是我的问题所在。请考虑以下三个CNN，查看这些数据集的一些ROC曲线：

我很奇怪地看到标准化精度达到50％的方法也获得了0.85、0.90甚至0.97 AUC。最后一个AUC似乎来自几乎完美的分类器，但是如果归一化精度为50％，怎么可能呢？

那是什么原因呢？是因为：

1-我的问题不平衡。那么阳性样本是否最常见于我的数据集中？ROC中感兴趣的类别是否会影响结果？

2-我将块的平均得分用作图像得分。有什么办法可以解决这个问题？

这是我用来生成标签和乐谱（PYTHON）的代码

 base_model=MobileNet(input_shape (64,64,3),weights=None,include_top=False)
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(64, activation='relu')(x)
    predictions = Dense(2, activation='softmax')(x)
    model = Model(inputs=base_model.input, outputs=predictions)
    model.load_weights(model_path)

    intermediate_layer_model = Model(inputs=model.input, outputs=model.get_layer("dense_2").output)
    print("Loaded model from disk")
    intermediate_layer_model.compile(loss='categorical_crossentropy', optimizer=algorithm, metrics=['accuracy'])

    #read images, divide them into blocks, predict images and define the mean scores as the score for an image
    with open(test_images_path) as f:
            images_list = f.readlines()
            images_name = [a.strip() for a in images_list]
            predicted_image_vector = []
            groundtruth_image_vector = []

            for line in images_name:
                x_test=[]
                y_test=[]
                print(line)
                image = cv2.imread(line,1)
                #divide into blocks
                windows = view_as_windows(image, (64,64,3), step=64)

                #prepare blocks to be tested later 
                for i in range(windows.shape[0]):
                    for j in range(windows.shape[1]):
                            block=np.squeeze(windows[i,j])
                            x_test.append(block)
                            label = du.define_class(line)
                            y_test.append(label)

            #predict scores for all blocks in the current test image
            intermediate_output = intermediate_layer_model.predict(np.asarray(x_test), batch_size=32, verbose=0)
            #the score for an image is the mean score of its blocks
            prediction_current_image=np.mean(intermediate_output, axis=0)
            predicted_image_vector.append(prediction_current_image)
 groundtruth_image_vector.append(np.argmax(np.bincount(np.asarray(y_test))))

    predicted_image_vector=np.array(predicted_image_vector)
    groundtruth_image_vector=np.array(groundtruth_image_vector)
    print("saving scores and labels to plot ROC curves")

    np.savetxt(dataset_name+ '-scores.txt', predicted_image_vector, delimiter=',') 
    np.savetxt(dataset_name+ '-labels.txt', groundtruth_image_vector, delimiter=',')

这是我用来生成ROC曲线（MATLAB）的代码

function plot_roc(labels_file, scores_file, name_file, dataset_name)

    format longG
    label=dlmread(labels_file);
    scores=dlmread(scores_file);
    [X,Y,T,AUC] = perfcurve(label,scores(:,2),1);   

    f=figure()
    plot(X,Y);
    title(['ROC Curves for Mobilenet in ' dataset_name])
    xlabel('False positive rate'); 
    ylabel('True positive rate');
    txt = {'Area Under the Curve:', AUC};
    text(0.5,0.5,txt)
    saveas(f, name_file);
    disp("ok")



end

Answer 1

From what I understand about your method - The input image is divided into separate patches that are processed independently a CNN model. Each patch gets its own classification (or score, depending if it is after or prior to the softmax). Than the class of the image is determined based on a vote of the classes of the patches.

But than when you build your ROC curve, you are using the mean scores of the individual patches to determine the classification of the image.

These two different approaches are the reason for the disassociation between the AUC and the normalized accuracy.

For example:

Say you have 3 patches in an image with the following probabilities (for 2 classes):

[cls a, cls b]

[0.51, 0.49]

[0.01, 0.99]

By voting class a is the prediction (2 patches vs 1), by mean score class b is the prediction (0.657 vs 0.343).

Personally I don't think that voting is the correct way to classify the image based on patches because it does not take into account the certainty of the model regarding different patches, as was shown in the example. But you are more familiar with your dataset, so perhaps I am wrong.

Regarding how to overcome your problem, I think some more info about the nature of the dataset and the task would help (how unbalanced, what is the final goal, etc..)

考虑深倾斜方法时，我的ROC曲线有什么问题？

1 个答案: