在训练之外从对象级别的语义分段评估预测边界框的度量

时间:2018-11-20 09:40:24

标签: python tensorflow machine-learning computer-vision semantic-segmentation

上下文

为简单起见,让我们假设我们正在对一系列宽度为 w 的一个像素高图像进行语义分割,该图像具有三个通道(r,g,b)和 n 标签类。

换句话说,一张图片可能看起来像:

img = [
    [r1, r2, ..., rw], # channel r
    [g1, g2, ..., gw], # channel g
    [b1, b2, ..., bw], # channel b
]

,尺寸为[3, w]

然后对于具有w=10n=3的给定图像,其标签真实性可能为:

# ground "truth"
target = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0,    1,    1,    1,    0,    0,    1,    1,    1,    1],    # class 1
  [0,    0,    0,    0,    1,    1,    1,    1,    0,    0],    # class 2
  [1,    0,    0,    0,    0,    0,    0,    0,    0,    0],    # class 3
])

,我们的模型可能会预测为输出:

# prediction
output = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
  [0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
  [0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])

为进一步简化,让我们通过使用0.9的截止值对模型进行二值化来转换模型的输出。

# binary mask with cutoff 0.9
b_mask = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0,    0,    1,    1,    0,    0,    0,    0,    1,    0],    # class 1
  [0,    0,    0,    0,    1,    0,    1,    1,    0,    0],    # class 2
  [1,    0,    0,    0,    0,    0,    0,    0,    0,    0],    # class 3
])

然后,如果我们要查看每个类的“对象”,则边界框(或者在本例中仅是边界,即[start, stop]像素)从二进制掩码的预测对象“引入”一个对象:

# "detected" objects
p_obj = [
  [[2, 3], [8, 8]],  # class 1
  [[4, 4], [6, 7]],  # class 2
  [[0, 0]]           # class 3
] 

与地面真理的对象相比:

# true objects
t_obj = [
  [[1, 3], [6, 9]],  # class 1
  [[4, 7]],          # class 2
  [[0, 0]]           # class 3
] 

问题

如果我想要一个度量来描述平均每个对象的边界的准确性,什么是合适的度量?

我在训练模型时了解IOU,该模型预测边界框,例如这是一个对象到对象的比较,但是当一个对象可能被分成多个对象时该怎么办?

目标

我希望每个班级的指标都可以给我这样的东西:

class 1: [-1, 2]  # bounding boxes for class one, on average start one
                  # pixel before they should and end two pixels after 
                  # they should

class 2: [ 0, 3]  # bounding boxes for class two, on average start 
                  # exactly where they should and end three pixels  
                  # after they should

class 3: [ 3, -1] # bounding boxes for class three, on average start 
                  # three pixels after where they begin and end one 
                  # pixels too soon

但是我不确定当单个对象分成多个对象时如何最好地解决这个问题。

1 个答案:

答案 0 :(得分:0)

假设

您专门询问1D情况,因此我们将在这里解决1D情况,但是2D方法基本上相同。

让我们假设您有两个地面实况边界框:框1和框2。

此外,让我们假设我们的模型不是很好,并且可以预测2个以上的盒子 (也许它发现了一些新东西,也许它把一个盒子分成了两个)。

对于此演示,让我们考虑这就是我们正在使用的东西:

# labels
# box 1: x----y 
# box 2: x++++y
# 0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20
#             x--------y        x+++++++++++++++++++++++++++++y     TRUTH
#             a-----------b                                         PRED 1, BOX 1
#                   a+++++++++++++++++b                             PRED 2, BOX 2
#                a++++++++++++++++++++++++++++++++b                 PRED 3, BOX 2

核心问题

您想要的实际上是您的预测与目标对齐的分数。...但是,不!哪个目标 属于哪些预测?

选择您选择的距离函数,并将每个预测与基于该函数的目标配对。 在这种情况下,我将对一维情况使用修改后的联合交叉点(IOU)。 我选择此功能是因为我希望上图中的PRED 2和PRED 3与方框2保持一致。

为每个预测评分,将其与产生最佳评分的目标配对。

现在有了一对一的预测目标对,计算您想要的一切。

具有上述假设的演示

根据以上假设:

pred_boxes = [
    [4,  8],
    [6, 12],
    [5, 16]
]

true_boxes = [
    [4,   7],
    [10, 20]
]

联合的交集的一维版本:

def iou_1d(predicted_boundary, target_boundary):
  '''Calculates the intersection over union (IOU) based on a span.

  Notes:
    boundaries are provided in the the form of [start, stop].
    boundaries where start = stop are accepted
    boundaries are assumed to be only in range [0, int < inf)

  Args:
    predicted_boundary (list): the [start, stop] of the predicted boundary
    target_boundary (list): the ground truth [start, stop] for which to compare

  Returns:
    iou (float): the IOU bounded in [0, 1]
  '''

  p_lower, p_upper = predicted_boundary
  t_lower, t_upper = target_boundary

  # boundaries are in form [start, stop] and 0<= start <= stop
  assert 0<= p_lower <= p_upper
  assert 0<= t_lower <= t_upper

   # no overlap, pred is too far left or pred is too far right
  if p_upper < t_lower or p_lower > t_upper:
    return 0

  if predicted_boundary == target_boundary:
    return 1

  intersection_lower_bound = max(p_lower, t_lower)
  intersection_upper_bound = min(p_upper, t_upper)


  intersection = intersection_upper_bound - intersection_lower_bound
  union = max(t_upper, p_upper) - min(t_lower, p_lower)  
  union = union if union != 0 else 1  
  return min(intersection / union, 1)

一些简单的助手:

from math import sqrt
def euclidean(u, v):
  return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

def mean(arr):
  return sum(arr) / len(arr)

我们如何调整边界:

def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
  '''Aligns predicted_bondary to the closest target_boundary based on the 
    alignment_scoring_fn

  Args:
    predicted_boundary (list): the predicted boundary in form of [start, stop]

    target_boundaries (list): a list of all valid target boundaries each having
      form [start, stop]

    alignment_scoring_fn (function): a function taking two arguments each of 
      which is a list of two elements, the first assumed to be the predicted
      boundary and the latter the target boundary. Should return a single number.

    take (function): should either be min or max. Selects either the highest or
      lower score according to the alignment_scoring_fn

  Returns:
    aligned_boundary (list): the aligned boundary in form [start, stop]
  '''
  scores = [
      alignment_scoring_fn(predicted_boundary, target_boundary) 
      for target_boundary in target_boundaries
  ]



  # boundary did not align to any boxes, use fallback scoring mechanism to break
  # tie
  if not any(scores):
    scores = [
      1 / euclidean(predicted_boundary, target_boundary)
      for target_boundary in target_boundaries
    ]

  aligned_index = scores.index(take(scores))
  aligned = target_boundaries[aligned_index]
  return aligned

我们如何计算差异:

def diff(u, v):
  return [u[0] - v[0], u[1] - v[1]]

将所有内容组合为一个:

def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
  '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

  Args:
    predicted_boundaries (list): a list of all valid target boundaries each 
      having form [start, stop]

    target_boundaries (list): a list of all valid target boundaries each having
      form [start, stop]

    alignment_scoring_fn (function): a function taking two arguments each of 
      which is a list of two elements, the first assumed to be the predicted
      boundary and the latter the target boundary. Should return a single number.

    take (function): should either be min or max. Selects either the highest or
      lower score according to the alignment_scoring_fn

    distance_fn (function): a function taking two lists and should return a
      single value.

    aggregate_fn (function): a function taking a list of numbers (distances 
      calculated by distance_fn) and returns a single value (the aggregated 
      distance)

  Returns:
    aggregated_distnace (float): return the aggregated distance of the 
      aligned predicted_boundaries

      aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
  '''


  paired = [
      (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
      for predicted_boundary in predicted_boundaries
  ]
  distances = [distance_fn(*pair) for pair in paired]
  aggregated = [aggregate_fn(error) for error in zip(*distances)]
  return aggregated

运行:

aligned_distance_1d(pred_boxes, true_boxes)

# [-3.0, -3.6666666666666665]

注意,对于许多预测和许多目标,有很多方法可以优化代码。在这里,我分解了主要的功能块,因此很清楚发生了什么。

现在这有意义吗?好吧,既然我想让pred 2和3与方框2对齐,是的,两个起点都在事实之前,并且都提前结束。

问题解答

复制粘贴您的示例:

# "detected" objects
p_obj = [
  [[2, 3], [8, 8]],  # class 1
  [[4, 4], [6, 7]],  # class 2
  [[0, 0]]           # class 3
] 

# true objects
t_obj = [
  [[1, 3], [6, 9]],  # class 1
  [[4, 7]],          # class 2
  [[0, 0]]           # class 3
] 

因为您知道每个班级的盒子,所以很简单:

[
    aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
    for cls_no in range(len(t_obj))
]


# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]

此输出有意义吗?

从健全性检查开始,让我们看一下第3类。[开始,停止]的平均距离都为0。很有意义。

第1类怎么样?两种预测的开始都太迟了(2> 1,8> 6),但是只有一个结束太早了(8 <9)。所以有道理。

现在让我们看看第2类,这就是为什么您似乎在问这个问题(预测多于目标)的原因。

如果我们要画出分数表明的话,那就是:

#  0  1  2  3  4  5  6  7  8  9
#              ----------        # truth [4, 7]
#                 ++             # pred  [4 + 1, 7 - 1.5]

看起来并不好,但这只是一个示例...

这有意义吗?是/否是的,就我们如何计算指标而言。一个停止太早停止3个值,另一个开始太晚停止2个值。 从某种意义上说,您的预测均未涵盖值5,但从这个意义上讲,您可以认为确实如此。

结论

这是错误的指标吗?

取决于您正在使用/尝试显示它的内容。 但是,由于使用二进制掩码生成预测边界,所以这是此问题不可忽略的根源。也许有更好的策略可以从标签概率中获得界限。