上下文

Question

上下文

为简单起见，让我们假设我们正在对一系列宽度为 w 的一个像素高图像进行语义分割，该图像具有三个通道（r，g，b）和 n 标签类。

换句话说，一张图片可能看起来像：

img = [
    [r1, r2, ..., rw], # channel r
    [g1, g2, ..., gw], # channel g
    [b1, b2, ..., bw], # channel b
]

，尺寸为[3, w]。

然后对于具有w=10和n=3的给定图像，其标签真实性可能为：

# ground "truth"
target = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0,    1,    1,    1,    0,    0,    1,    1,    1,    1],    # class 1
  [0,    0,    0,    0,    1,    1,    1,    1,    0,    0],    # class 2
  [1,    0,    0,    0,    0,    0,    0,    0,    0,    0],    # class 3
])

，我们的模型可能会预测为输出：

# prediction
output = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
  [0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
  [0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])

为进一步简化，让我们通过使用0.9的截止值对模型进行二值化来转换模型的输出。

# binary mask with cutoff 0.9
b_mask = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0,    0,    1,    1,    0,    0,    0,    0,    1,    0],    # class 1
  [0,    0,    0,    0,    1,    0,    1,    1,    0,    0],    # class 2
  [1,    0,    0,    0,    0,    0,    0,    0,    0,    0],    # class 3
])

然后，如果我们要查看每个类的“对象”，则边界框（或者在本例中仅是边界，即[start, stop]像素）从二进制掩码的预测对象“引入”一个对象：

# "detected" objects
p_obj = [
  [[2, 3], [8, 8]],  # class 1
  [[4, 4], [6, 7]],  # class 2
  [[0, 0]]           # class 3
]

与地面真理的对象相比：

# true objects
t_obj = [
  [[1, 3], [6, 9]],  # class 1
  [[4, 7]],          # class 2
  [[0, 0]]           # class 3
]

问题

如果我想要一个度量来描述平均每个对象的边界的准确性，什么是合适的度量？

我在训练模型时了解IOU，该模型预测边界框，例如这是一个对象到对象的比较，但是当一个对象可能被分成多个对象时该怎么办？

目标

我希望每个班级的指标都可以给我这样的东西：

class 1: [-1, 2]  # bounding boxes for class one, on average start one
                  # pixel before they should and end two pixels after 
                  # they should

class 2: [ 0, 3]  # bounding boxes for class two, on average start 
                  # exactly where they should and end three pixels  
                  # after they should

class 3: [ 3, -1] # bounding boxes for class three, on average start 
                  # three pixels after where they begin and end one 
                  # pixels too soon

但是我不确定当单个对象分成多个对象时如何最好地解决这个问题。

Answer 1

假设

您专门询问1D情况，因此我们将在这里解决1D情况，但是2D方法基本上相同。

让我们假设您有两个地面实况边界框：框1和框2。

此外，让我们假设我们的模型不是很好，并且可以预测2个以上的盒子（也许它发现了一些新东西，也许它把一个盒子分成了两个）。

对于此演示，让我们考虑这就是我们正在使用的东西：

# labels
# box 1: x----y 
# box 2: x++++y
# 0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20
#             x--------y        x+++++++++++++++++++++++++++++y     TRUTH
#             a-----------b                                         PRED 1, BOX 1
#                   a+++++++++++++++++b                             PRED 2, BOX 2
#                a++++++++++++++++++++++++++++++++b                 PRED 3, BOX 2

核心问题

您想要的实际上是您的预测与目标对齐的分数。...但是，不！哪个目标属于哪些预测？

选择您选择的距离函数，并将每个预测与基于该函数的目标配对。在这种情况下，我将对一维情况使用修改后的联合交叉点（IOU）。我选择此功能是因为我希望上图中的PRED 2和PRED 3与方框2保持一致。

为每个预测评分，将其与产生最佳评分的目标配对。

现在有了一对一的预测目标对，计算您想要的一切。

具有上述假设的演示

根据以上假设：

pred_boxes = [
    [4,  8],
    [6, 12],
    [5, 16]
]

true_boxes = [
    [4,   7],
    [10, 20]
]

联合的交集的一维版本：

def iou_1d(predicted_boundary, target_boundary):
  '''Calculates the intersection over union (IOU) based on a span.

  Notes:
    boundaries are provided in the the form of [start, stop].
    boundaries where start = stop are accepted
    boundaries are assumed to be only in range [0, int < inf)

  Args:
    predicted_boundary (list): the [start, stop] of the predicted boundary
    target_boundary (list): the ground truth [start, stop] for which to compare

  Returns:
    iou (float): the IOU bounded in [0, 1]
  '''

  p_lower, p_upper = predicted_boundary
  t_lower, t_upper = target_boundary

  # boundaries are in form [start, stop] and 0<= start <= stop
  assert 0<= p_lower <= p_upper
  assert 0<= t_lower <= t_upper

   # no overlap, pred is too far left or pred is too far right
  if p_upper < t_lower or p_lower > t_upper:
    return 0

  if predicted_boundary == target_boundary:
    return 1

  intersection_lower_bound = max(p_lower, t_lower)
  intersection_upper_bound = min(p_upper, t_upper)


  intersection = intersection_upper_bound - intersection_lower_bound
  union = max(t_upper, p_upper) - min(t_lower, p_lower)  
  union = union if union != 0 else 1  
  return min(intersection / union, 1)

一些简单的助手：

from math import sqrt
def euclidean(u, v):
  return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

def mean(arr):
  return sum(arr) / len(arr)

我们如何调整边界：

def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
  '''Aligns predicted_bondary to the closest target_boundary based on the 
    alignment_scoring_fn

  Args:
    predicted_boundary (list): the predicted boundary in form of [start, stop]

    target_boundaries (list): a list of all valid target boundaries each having
      form [start, stop]

    alignment_scoring_fn (function): a function taking two arguments each of 
      which is a list of two elements, the first assumed to be the predicted
      boundary and the latter the target boundary. Should return a single number.

    take (function): should either be min or max. Selects either the highest or
      lower score according to the alignment_scoring_fn

  Returns:
    aligned_boundary (list): the aligned boundary in form [start, stop]
  '''
  scores = [
      alignment_scoring_fn(predicted_boundary, target_boundary) 
      for target_boundary in target_boundaries
  ]



  # boundary did not align to any boxes, use fallback scoring mechanism to break
  # tie
  if not any(scores):
    scores = [
      1 / euclidean(predicted_boundary, target_boundary)
      for target_boundary in target_boundaries
    ]

  aligned_index = scores.index(take(scores))
  aligned = target_boundaries[aligned_index]
  return aligned

我们如何计算差异：

def diff(u, v):
  return [u[0] - v[0], u[1] - v[1]]

将所有内容组合为一个：

def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
  '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

  Args:
    predicted_boundaries (list): a list of all valid target boundaries each 
      having form [start, stop]

    target_boundaries (list): a list of all valid target boundaries each having
      form [start, stop]

    alignment_scoring_fn (function): a function taking two arguments each of 
      which is a list of two elements, the first assumed to be the predicted
      boundary and the latter the target boundary. Should return a single number.

    take (function): should either be min or max. Selects either the highest or
      lower score according to the alignment_scoring_fn

    distance_fn (function): a function taking two lists and should return a
      single value.

    aggregate_fn (function): a function taking a list of numbers (distances 
      calculated by distance_fn) and returns a single value (the aggregated 
      distance)

  Returns:
    aggregated_distnace (float): return the aggregated distance of the 
      aligned predicted_boundaries

      aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
  '''


  paired = [
      (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
      for predicted_boundary in predicted_boundaries
  ]
  distances = [distance_fn(*pair) for pair in paired]
  aggregated = [aggregate_fn(error) for error in zip(*distances)]
  return aggregated

运行：

aligned_distance_1d(pred_boxes, true_boxes)

# [-3.0, -3.6666666666666665]

注意，对于许多预测和许多目标，有很多方法可以优化代码。在这里，我分解了主要的功能块，因此很清楚发生了什么。

现在这有意义吗？好吧，既然我想让pred 2和3与方框2对齐，是的，两个起点都在事实之前，并且都提前结束。

问题解答

复制粘贴您的示例：

# "detected" objects
p_obj = [
  [[2, 3], [8, 8]],  # class 1
  [[4, 4], [6, 7]],  # class 2
  [[0, 0]]           # class 3
] 

# true objects
t_obj = [
  [[1, 3], [6, 9]],  # class 1
  [[4, 7]],          # class 2
  [[0, 0]]           # class 3
]

因为您知道每个班级的盒子，所以很简单：

[
    aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
    for cls_no in range(len(t_obj))
]


# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]

此输出有意义吗？

从健全性检查开始，让我们看一下第3类。[开始，停止]的平均距离都为0。很有意义。

第1类怎么样？两种预测的开始都太迟了（2> 1，8> 6），但是只有一个结束太早了（8 <9）。所以有道理。

现在让我们看看第2类，这就是为什么您似乎在问这个问题（预测多于目标）的原因。

如果我们要画出分数表明的话，那就是：

#  0  1  2  3  4  5  6  7  8  9
#              ----------        # truth [4, 7]
#                 ++             # pred  [4 + 1, 7 - 1.5]

看起来并不好，但这只是一个示例...

这有意义吗？是/否是的，就我们如何计算指标而言。一个停止太早停止3个值，另一个开始太晚停止2个值。从某种意义上说，您的预测均未涵盖值5，但从这个意义上讲，您可以认为确实如此。

结论

这是错误的指标吗？

取决于您正在使用/尝试显示它的内容。但是，由于使用二进制掩码生成预测边界，所以这是此问题不可忽略的根源。也许有更好的策略可以从标签概率中获得界限。

在训练之外从对象级别的语义分段评估预测边界框的度量

上下文

问题

目标

1 个答案:

假设

核心问题

具有上述假设的演示

问题解答

结论