为简单起见,让我们假设我们正在对一系列宽度为 w 的一个像素高图像进行语义分割,该图像具有三个通道(r,g,b)和 n 标签类。
换句话说,一张图片可能看起来像:
img = [
[r1, r2, ..., rw], # channel r
[g1, g2, ..., gw], # channel g
[b1, b2, ..., bw], # channel b
]
,尺寸为[3, w]
。
然后对于具有w=10
和n=3
的给定图像,其标签真实性可能为:
# ground "truth"
target = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
,我们的模型可能会预测为输出:
# prediction
output = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
[0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
[0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])
为进一步简化,让我们通过使用0.9
的截止值对模型进行二值化来转换模型的输出。
# binary mask with cutoff 0.9
b_mask = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
然后,如果我们要查看每个类的“对象”,则边界框(或者在本例中仅是边界,即[start, stop]
像素)从二进制掩码的预测对象“引入”一个对象:>
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
与地面真理的对象相比:
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
如果我想要一个度量来描述平均每个对象的边界的准确性,什么是合适的度量?
我在训练模型时了解IOU,该模型预测边界框,例如这是一个对象到对象的比较,但是当一个对象可能被分成多个对象时该怎么办?
我希望每个班级的指标都可以给我这样的东西:
class 1: [-1, 2] # bounding boxes for class one, on average start one
# pixel before they should and end two pixels after
# they should
class 2: [ 0, 3] # bounding boxes for class two, on average start
# exactly where they should and end three pixels
# after they should
class 3: [ 3, -1] # bounding boxes for class three, on average start
# three pixels after where they begin and end one
# pixels too soon
但是我不确定当单个对象分成多个对象时如何最好地解决这个问题。
答案 0 :(得分:0)
您专门询问1D情况,因此我们将在这里解决1D情况,但是2D方法基本上相同。
让我们假设您有两个地面实况边界框:框1和框2。
此外,让我们假设我们的模型不是很好,并且可以预测2个以上的盒子 (也许它发现了一些新东西,也许它把一个盒子分成了两个)。
对于此演示,让我们考虑这就是我们正在使用的东西:
# labels
# box 1: x----y
# box 2: x++++y
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# x--------y x+++++++++++++++++++++++++++++y TRUTH
# a-----------b PRED 1, BOX 1
# a+++++++++++++++++b PRED 2, BOX 2
# a++++++++++++++++++++++++++++++++b PRED 3, BOX 2
您想要的实际上是您的预测与目标对齐的分数。...但是,不!哪个目标 属于哪些预测?
选择您选择的距离函数,并将每个预测与基于该函数的目标配对。 在这种情况下,我将对一维情况使用修改后的联合交叉点(IOU)。 我选择此功能是因为我希望上图中的PRED 2和PRED 3与方框2保持一致。
为每个预测评分,将其与产生最佳评分的目标配对。
现在有了一对一的预测目标对,计算您想要的一切。
根据以上假设:
pred_boxes = [
[4, 8],
[6, 12],
[5, 16]
]
true_boxes = [
[4, 7],
[10, 20]
]
联合的交集的一维版本:
def iou_1d(predicted_boundary, target_boundary):
'''Calculates the intersection over union (IOU) based on a span.
Notes:
boundaries are provided in the the form of [start, stop].
boundaries where start = stop are accepted
boundaries are assumed to be only in range [0, int < inf)
Args:
predicted_boundary (list): the [start, stop] of the predicted boundary
target_boundary (list): the ground truth [start, stop] for which to compare
Returns:
iou (float): the IOU bounded in [0, 1]
'''
p_lower, p_upper = predicted_boundary
t_lower, t_upper = target_boundary
# boundaries are in form [start, stop] and 0<= start <= stop
assert 0<= p_lower <= p_upper
assert 0<= t_lower <= t_upper
# no overlap, pred is too far left or pred is too far right
if p_upper < t_lower or p_lower > t_upper:
return 0
if predicted_boundary == target_boundary:
return 1
intersection_lower_bound = max(p_lower, t_lower)
intersection_upper_bound = min(p_upper, t_upper)
intersection = intersection_upper_bound - intersection_lower_bound
union = max(t_upper, p_upper) - min(t_lower, p_lower)
union = union if union != 0 else 1
return min(intersection / union, 1)
一些简单的助手:
from math import sqrt
def euclidean(u, v):
return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)
def mean(arr):
return sum(arr) / len(arr)
我们如何调整边界:
def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
'''Aligns predicted_bondary to the closest target_boundary based on the
alignment_scoring_fn
Args:
predicted_boundary (list): the predicted boundary in form of [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
Returns:
aligned_boundary (list): the aligned boundary in form [start, stop]
'''
scores = [
alignment_scoring_fn(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
# boundary did not align to any boxes, use fallback scoring mechanism to break
# tie
if not any(scores):
scores = [
1 / euclidean(predicted_boundary, target_boundary)
for target_boundary in target_boundaries
]
aligned_index = scores.index(take(scores))
aligned = target_boundaries[aligned_index]
return aligned
我们如何计算差异:
def diff(u, v):
return [u[0] - v[0], u[1] - v[1]]
将所有内容组合为一个:
def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
'''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn
Args:
predicted_boundaries (list): a list of all valid target boundaries each
having form [start, stop]
target_boundaries (list): a list of all valid target boundaries each having
form [start, stop]
alignment_scoring_fn (function): a function taking two arguments each of
which is a list of two elements, the first assumed to be the predicted
boundary and the latter the target boundary. Should return a single number.
take (function): should either be min or max. Selects either the highest or
lower score according to the alignment_scoring_fn
distance_fn (function): a function taking two lists and should return a
single value.
aggregate_fn (function): a function taking a list of numbers (distances
calculated by distance_fn) and returns a single value (the aggregated
distance)
Returns:
aggregated_distnace (float): return the aggregated distance of the
aligned predicted_boundaries
aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
'''
paired = [
(predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
for predicted_boundary in predicted_boundaries
]
distances = [distance_fn(*pair) for pair in paired]
aggregated = [aggregate_fn(error) for error in zip(*distances)]
return aggregated
运行:
aligned_distance_1d(pred_boxes, true_boxes)
# [-3.0, -3.6666666666666665]
注意,对于许多预测和许多目标,有很多方法可以优化代码。在这里,我分解了主要的功能块,因此很清楚发生了什么。
现在这有意义吗?好吧,既然我想让pred 2和3与方框2对齐,是的,两个起点都在事实之前,并且都提前结束。
复制粘贴您的示例:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
因为您知道每个班级的盒子,所以很简单:
[
aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
for cls_no in range(len(t_obj))
]
# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]
此输出有意义吗?
从健全性检查开始,让我们看一下第3类。[开始,停止]的平均距离都为0。很有意义。
第1类怎么样?两种预测的开始都太迟了(2> 1,8> 6),但是只有一个结束太早了(8 <9)。所以有道理。
现在让我们看看第2类,这就是为什么您似乎在问这个问题(预测多于目标)的原因。
如果我们要画出分数表明的话,那就是:
# 0 1 2 3 4 5 6 7 8 9
# ---------- # truth [4, 7]
# ++ # pred [4 + 1, 7 - 1.5]
看起来并不好,但这只是一个示例...
这有意义吗?是/否是的,就我们如何计算指标而言。一个停止太早停止3个值,另一个开始太晚停止2个值。 从某种意义上说,您的预测均未涵盖值5,但从这个意义上讲,您可以认为确实如此。
这是错误的指标吗?
取决于您正在使用/尝试显示它的内容。 但是,由于使用二进制掩码生成预测边界,所以这是此问题不可忽略的根源。也许有更好的策略可以从标签概率中获得界限。