我最近一直在进行一些图像分割任务,并希望从头开始应用一个。
据我所知,分段是每个像素预测所属的区域-对象实例(事物),背景片段实例(事物)。
根据最新算法Mask RCNN所基于的COCO数据集:
事物是可计数的物体,例如人,动物,工具。材料类别是具有相似纹理或材质的无定形区域,例如草,天空,道路。
根据Mask Rcnn论文,最终分类是采用每个像素S形的二元交叉熵损失函数(以避免类内竞争)。该管道基于FRCNN对象检测管道的顶部,从那里获取兴趣区域(roi),并将它们传递给ROI-align类,以保持空间信息完整。
我感到困惑的是以下内容。下面给出一个非常简单的代码段,以将Binary Cross Entropy损失应用于分离3个完全连接的层(一些带有比例的随机实验):
class ModelMain(nn.Module):
def __init__(self, config, is_training=True):
super(ModelMain, self).__init__()
self.fc_1 = torch.nn.Linear(incoming_size_1, outgoing_size_1)
self.fc_2 = torch.nn.Linear(incoming_size_2, outgoing_size_2)
self.fc_3 = torch.nn.Linear(incoming_size_3, outgoing_size_3)
def forward(self, x):
y_1 = F.sigmoid(self.fc_1(x))
y_2 = F.sigmoid(self.fc_2(x))
y_3 = F.sigmoid(self.fc_3(x))
return y_1, y_2, y_3
model = ModelMain()
criterion = torch.nn.BCELoss(size_average = True)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
def run_epoch():
batchsize = 10
for epoch in range(batchsize):
# Find image segment predicted by running forward pass:
y_predicted_1, y_predicted_2, y_predicted_3 = model(batch_data_x)
# Compute and print loss :
loss_1 = criterion(y_predicted_1, batch_data_y)
loss_2 = criterion(y_predicted_2, batch_data_y)
loss_3 = criterion(y_predicted_3, batch_data_y)
print( "Epoch ", epoch, "Loss : ", loss_1, loss_2, loss_3)
# Perform Backward pass :
optimizer.zero_grad()
loss_1.backward()
loss_2.backward()
loss_3.backward()
optimizer.step()
...在这里,我们到底提供什么标签?
从数据集中:
Formatted JSON Data
图片:
{
"license":2,
"file_name":"000000000139.jpg",
"coco_url":"http://images.cocodataset.org/val2017/000000000139.jpg",
"height":426,
"width":640,
"date_captured":"2013-11-21 01:34:01",
"flickr_url":"http://farm9.staticflickr.com/8035/8024364858_9c41dc1666_z.jpg",
"id":139
}
细分信息:
{
"segments_info":[
{
"id":3226956,
"category_id":1,
"iscrowd":0,
"bbox":[
413,
158,
53,
138
],
"area":2840
},
{
"id":6979964,
"category_id":1,
"iscrowd":0,
"bbox":[
384,
172,
16,
36
],
"area":439
},
{
"id":3103374,
"category_id":62,
"iscrowd":0,
"bbox":[
413,
223,
30,
81
],
"area":1250
},
{
"id":2831194,
"category_id":62,
"iscrowd":0,
"bbox":[
291,
218,
62,
98
],
"area":1848
},
{
"id":3496593,
"category_id":62,
"iscrowd":0,
"bbox":[
412,
219,
10,
13
],
"area":90
},
{
"id":2633066,
"category_id":62,
"iscrowd":0,
"bbox":[
317,
219,
22,
12
],
"area":212
},
{
"id":3165572,
"category_id":62,
"iscrowd":0,
"bbox":[
359,
218,
56,
103
],
"area":2251
},
{
"id":8824489,
"category_id":64,
"iscrowd":0,
"bbox":[
237,
149,
24,
62
],
"area":369
},
{
"id":3032951,
"category_id":67,
"iscrowd":0,
"bbox":[
321,
231,
126,
89
],
"area":2134
},
{
"id":2038814,
"category_id":72,
"iscrowd":0,
"bbox":[
7,
168,
149,
95
],
"area":13247
},
{
"id":3289671,
"category_id":72,
"iscrowd":0,
"bbox":[
557,
209,
82,
79
],
"area":5846
},
{
"id":2437710,
"category_id":78,
"iscrowd":0,
"bbox":[
512,
206,
15,
16
],
"area":224
},
{
"id":4159376,
"category_id":82,
"iscrowd":0,
"bbox":[
493,
174,
20,
108
],
"area":2056
},
{
"id":3423599,
"category_id":84,
"iscrowd":0,
"bbox":[
613,
308,
13,
46
],
"area":324
},
{
"id":3094634,
"category_id":84,
"iscrowd":0,
"bbox":[
605,
306,
14,
45
],
"area":331
},
{
"id":3296100,
"category_id":85,
"iscrowd":0,
"bbox":[
448,
121,
14,
22
],
"area":227
},
{
"id":6054280,
"category_id":86,
"iscrowd":0,
"bbox":[
241,
195,
14,
18
],
"area":187
},
{
"id":5942189,
"category_id":86,
"iscrowd":0,
"bbox":[
549,
309,
36,
90
],
"area":2171
},
{
"id":4086154,
"category_id":86,
"iscrowd":0,
"bbox":[
351,
209,
11,
22
],
"area":178
},
{
"id":7438777,
"category_id":86,
"iscrowd":0,
"bbox":[
337,
200,
10,
16
],
"area":120
},
{
"id":3031159,
"category_id":118,
"iscrowd":0,
"bbox":[
0,
269,
564,
157
],
"area":49754
},
{
"id":9284267,
"category_id":119,
"iscrowd":0,
"bbox":[
338,
166,
29,
50
],
"area":842
},
{
"id":6068135,
"category_id":130,
"iscrowd":0,
"bbox":[
212,
11,
321,
127
],
"area":3391
},
{
"id":2567230,
"category_id":156,
"iscrowd":0,
"bbox":[
129,
168,
351,
162
],
"area":5699
},
{
"id":10334639,
"category_id":181,
"iscrowd":0,
"bbox":[
204,
63,
234,
174
],
"area":15587
},
{
"id":6266027,
"category_id":186,
"iscrowd":0,
"bbox":[
136,
0,
473,
116
],
"area":20106
},
{
"id":5274512,
"category_id":188,
"iscrowd":0,
"bbox":[
0,
38,
549,
297
],
"area":25483
},
{
"id":7238567,
"category_id":189,
"iscrowd":0,
"bbox":[
457,
350,
183,
76
],
"area":9421
},
{
"id":4224910,
"category_id":199,
"iscrowd":0,
"bbox":[
0,
0,
640,
358
],
"area":83201
},
{
"id":6391959,
"category_id":200,
"iscrowd":0,
"bbox":[
135,
359,
336,
67
],
"area":12618
}
],
"file_name":"000000000139.png",
"image_id":139
}
蒙版图像:
对于对象检测任务,我们有边界框,但是对于图像分割,我需要使用提供的遮罩来计算损耗。
那么上面的代码中batch_data_y
的值应该是什么。
它是蒙版图像的向量吗?但是,这是否使我的网络了解某个网段是什么颜色?还是我缺少其他细分注释?
答案 0 :(得分:1)
正如@hkchengrex在评论中提到的,蒙版图像中的颜色似乎是从真实图像中拾取的,这要么是巧合,要么是某些后处理以可视化的结果。
语义蒙版通常表示为/存储为图像,每个像素的值代表实际图片中的类别。例如,假设您正在考虑C
类,则图片M
的语义蒙版I
可以表示为图像,其中M(i,j) = c
表示像素{{1} }应该归类为属于语义类{{1}的{{1}中的I(i,j)
; c
中的c
,{{1}中的[0; C[
} },尺寸为{{1}的i
)。
现在,由于类彼此独立,因此网络预测它们的最佳方法是输出形状为[0, H[
的概率图j
,其中[0, W[
代表估计的(H, W)
属于类I
的概率(在P
和(H, W, C)
之间,因此是一个类似S型的激活函数)。
当您详细介绍自己时,通过这样的输出,您可以使用无损二进制交叉熵来训练您的网络-假设您预处理了真实的面具P(i,j,c)
,将其从{{ 1}}个图像中的值在0
中( logs )进入1
中,映射中的值在I(i,j)
中。这种预处理称为“一次热转换”,可以使用scatter()
c.f用Pytorch完成。此thread:
c
但是,另一种解决方案(可能不适合您的问题(如果要避免softmax,因为它包括此操作))是使用(非二进制)交叉熵损失。
torch.nn.CrossEntropyLoss()
将直接以M
(形状为HxW
)作为预测,而[0,C]
(形状为HxWxC
)则作为目标。
答案 1 :(得分:0)
@Aldream的直觉是正确的,但明确地针对他们提供二进制掩码的coco数据集,其网站上的文档不是很好:
用于操作以RLE格式存储的蒙版的接口。
RLE是一种简单而有效的格式,用于存储二进制掩码。 RLE 首先将矢量(或矢量化图像)分成一系列分段 恒定的区域,然后为每块简单地存储长度 那块。例如,给定M = [0 0 1 1 1 0 1],RLE计数将 为[2 3 1 1],或者对于M = [1 1 1 1 1 1 0 0],计数为[0 6 1] (请注意,奇数始终是零的数量)。代替 直接存储计数,使用 基于称为LEB128的通用方案的可变比特率表示。 来源:link
尽管我确实为平均二进制交叉熵损失编写了自己的自定义函数:
def l_cross_entropy2d(input, target, weight=None, size_average=True):
n, c, h, w = input.size()
nt, ct, ht, wt = target.size()
# Handle inconsistent size between input and target
if h > ht and w > wt: # upsample labels
target = target.unsqueeze(1)
target = F.upsample(target, size=(h, w), mode='nearest')
target = target.sequeeze(1)
elif h < ht and w < wt: # upsample images
input = F.upsample(input, size=(ht, wt), mode='bilinear')
elif h != ht and w != wt:
raise Exception("Only support upsampling")
# take per pixel sigmoid
sigm = F.sigmoid(input)
# change dimension to create 2d matrix where rows -> pixels and columns -> classes
# takes input tensor <n X c X h X w> outputs tensor < n*h*w X c >
sigm = sigm.transpose(1, 2).transpose(2, 3).contiguous().view(-1, c)
# change target to column tensor for calculating cross entropy and repeat it number of classes times
# Get all values from sigmoid tensor >= 0 (all pixels that have value)
sigm = sigm[target.view(-1, 1).repeat(1, c) >= 0]
sigm = sigm.view(-1, c)
mask = target >= 0
target = target[mask]
loss = F.nll_loss(sigm, target, ignore_index=250,
weight=weight, size_average=False)
if size_average:
loss /= mask.data.sum()
return loss