我是新手,所以如果我有任何错误,请多包涵。我只是为了练习而从头开始制作Faster RCNN。我正在使用本文中提到的预训练的VGG 16。我如何准确地将输入传递给RPN?我为每个图像生成了一些硬编码的锚框。现在我该如何训练他们?另外,如果有人演示了tf.data模块在出现图像的情况下的使用,那将有很大的帮助。
以下是我的模型。RPN: classifier section(前2层是VGG16层)
这是我正在使用的数据集:https://www.kaggle.com/zaraks/pascal-voc-2007
这是我的锚框代码:
def filewiseBB():
#Returns a dict containing {filename:[list of bb]}
base_dir='/tmp/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/Annotations/'
bbs={}
for j in os.listdir('/tmp/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/Annotations/'):
annot_dir=base_dir+j
with open(annot_dir) as fd:
doc = xmltodict.parse(fd.read())
bboxes=[]
if isinstance(doc['annotation']['object'],list):
for i in doc['annotation']['object']:
if int(i['difficult'])==0:
b=[]
b.append(i['name'])
b.append(i['bndbox']['xmin'])
b.append(i['bndbox']['ymin'])
b.append(i['bndbox']['xmax'])
b.append(i['bndbox']['ymax'])
bboxes.append(b)
else:
if int(doc['annotation']['object']['difficult'])==0:
b=[]
b.append(doc['annotation']['object']['name'])
b.append(doc['annotation']['object']['bndbox']['xmin'])
b.append(doc['annotation']['object']['bndbox']['ymin'])
b.append(doc['annotation']['object']['bndbox']['xmax'])
b.append(doc['annotation']['object']['bndbox']['ymax'])
bboxes.append(b)
print(doc['annotation']['filename'])
bbs.update({j:bboxes})
return bbs
def anchorBoxGenerator(imagename):
'''Returns a list of lists with proposed anchor boxes which are entirely inside the image. [[x,y,w,h],[...]]'''
image=cv2.imread('/tmp/VOCtrainval_06-Nov-2007/VOCdevkit/VOC2007/JPEGImages/'+imagename[:-4]+'.jpg')
height,width,_=image.shape
ancBoxes=[]
dims_boxes=[20,40,60,80]
for i in range(32,width,32):
for j in range(32,height,32):
for k in dims_boxes:
for m in dims_boxes:
if(((i-k)>0)and(j-m)>0):
ancBoxes.append([i,j,k,m])
return ancBoxes
关于如何从这里开始,我没有一个明确的概念。 如果我对RPN或一般文件有误解,请纠正我:) 预先感谢!