计算出pytorch上的蒙特卡洛辍学的准确性

时间:2020-09-01 16:33:45

标签: python pytorch montecarlo dropout

我发现在pytorch上实现了蒙特卡洛Dropout,实现此方法的主要思想是将模型的dropout层设置为train模式。这允许在不同的各种前向通过过程中使用不同的防漏罩。 该实现说明了如何将来自各个正向传递的多个预测如何堆叠在一起并用于计算不同的不确定性度量。

import sys

import numpy as np

import torch
import torch.nn as nn


def enable_dropout(model):
    """ Function to enable the dropout layers during test-time """
    for m in model.modules():
        if m.__class__.__name__.startswith('Dropout'):
            m.train()

def get_monte_carlo_predictions(data_loader,
                                forward_passes,
                                model,
                                n_classes,
                                n_samples):
    """ Function to get the monte-carlo samples and uncertainty estimates
    through multiple forward passes

    Parameters
    ----------
    data_loader : object
        data loader object from the data loader module
    forward_passes : int
        number of monte-carlo samples/forward passes
    model : object
        keras model
    n_classes : int
        number of classes in the dataset
    n_samples : int
        number of samples in the test set
    """

    dropout_predictions = np.empty((0, n_samples, n_classes))
    softmax = nn.Softmax(dim=1)
    for i in range(forward_passes):
        predictions = np.empty((0, n_classes))
        model.eval()
        enable_dropout(model)
        for i, (image, label) in enumerate(data_loader):

            image = image.to(torch.device('cuda'))
            with torch.no_grad():
                output = model(image)
                output = softmax(output) # shape (n_samples, n_classes)
            predictions = np.vstack((predictions, output.cpu().numpy()))

        dropout_predictions = np.vstack((dropout_predictions,
                                         predictions[np.newaxis, :, :]))
        # dropout predictions - shape (forward_passes, n_samples, n_classes)
    
    # Calculating mean across multiple MCD forward passes 
    mean = np.mean(dropout_predictions, axis=0) # shape (n_samples, n_classes)

    # Calculating variance across multiple MCD forward passes 
    variance = np.var(dropout_predictions, axis=0) # shape (n_samples, n_classes)

    epsilon = sys.float_info.min
    # Calculating entropy across multiple MCD forward passes 
    entropy = -np.sum(mean*np.log(mean + epsilon), axis=-1) # shape (n_samples,)

    # Calculating mutual information across multiple MCD forward passes 
    mutual_info = entropy - np.mean(np.sum(-dropout_predictions*np.log(dropout_predictions + epsilon),
                                            axis=-1), axis=0) # shape (n_samples,)

我想做的是计算不同前向通过的准确度,任何人都可以帮助我获得准确度以及对本实现中使用的尺寸进行任何更改

我正在使用CIFAR10数据集,并希望使用测试时间的删除时间data_loader的代码

 testset = torchvision.datasets.CIFAR10(root='./data', train=False,download=True, transform=test_transform)

 #loading the test set
data_loader = torch.utils.data.DataLoader(testset, batch_size=n_samples, shuffle=False, num_workers=4) ```

2 个答案:

答案 0 :(得分:2)

准确度是正确分类的样本的百分比。您可以创建一个布尔数组,该数组指示某个预测是否等于其相应的参考值,并且可以获取这些值的平均值来计算准确性。我在下面提供了一个代码示例。

import numpy as np

# 2 forward passes, 4 samples, 3 classes
# shape is (2, 4, 3)
dropout_predictions = np.asarray([
    [[0.2, 0.1, 0.7], [0.1, 0.5, 0.4], [0.9, 0.05, 0.05], [0.25, 0.74, 0.01]],
    [[0.1, 0.5, 0.4], [0.2, 0.6, 0.2], [0.8, 0.10, 0.10], [0.25, 0.01, 0.74]]
])

# Get the predicted value for each sample in each forward pass.
# Shape of output is (2, 4).
classes = dropout_predictions.argmax(-1)
# array([[2, 1, 0, 1],
#        [1, 1, 0, 2]])

# Test equality among the reference values and predicted classes.
# Shape is unchanged.
y_true = np.asarray([2, 1, 0, 1])
elementwise_equal = np.equal(y_true, classes)
# array([[ True,  True,  True,  True],
#        [False,  True,  True, False]])

# Calculate the accuracy for each forward pass.
# Shape is (2,).
elementwise_equal.mean(axis=1)
# array([1. , 0.5])

在上面的示例中,您可以看到第一次向前通过的精度为100%,第二次向前通过的精度为50%。

答案 1 :(得分:0)

@jakub的答案是正确的。但是,我想提出一种替代方法,这种方法可能更好,尤其是对于更多新生的研究人员。

Scikit-learn具有许多内置的性能测量功能,包括准确性。要使这些方法与PyTorch一起使用,您只需要将torch张量转换为numpy数组:

  x = torch.Tensor(...) # Fill-in as needed
  x_np = x.numpy() # Convert to numpy

然后,您只需导入scikit-learn:

   from sklearn.metrics import accuracy_score
   y_pred = [0, 2, 1, 3]
   y_true = [0, 1, 2, 3]
   accuracy_score(y_true, y_pred)

这仅返回0.5。轻松自在,很少出现错误。