根据图像中出现的顺序对检测到的文本边界框坐标进行排序

时间:2019-09-04 11:35:34

标签: computer-vision ocr bounding-box text-recognition

我使用了文本检测模型,该模型给出了边界框坐标。我已将多边形转换为矩形,以裁剪图像中的文本区域。结果边界框被打乱,我无法对其进行整理。根据我的理解,这些框是根据Y3排序的。但是,如果在同一行中有弯曲的文本(如下图所示),则顺序会被打乱,我需要先对其进行排序,然后再将其传递给文本提取模型。

Image with polygon coordinates

enter image description here

将多边形转换为矩形以裁剪文本区域

same image with converted rectangle bounding box

enter image description here

img_name='rre7'
orig=cv2.imread('CRAFT-pytorch/test/'+str(img_name)+'.jpg')
colnames=['x1','y1','x2','y2','x3','y3','x4','y4']
df=pd.read_csv('result/res_'+str(img_name)+'.txt',header=None, 
delimiter=',', names=colnames)
rect=[]
boxes=df.values
for i,(x1,y1,x2,y2,x3,y3,x4,y4) in enumerate(boxes):
    startX = min([x1,x2,x3,x4])
    startY = min([y1,y2,y3,y4])
    endX = max([x1,x2,x3,x4])
    endY = max([y1,y2,y3,y4])
    #print([startX,startY,endX,endY])
    rect.append([startX,startY,endX,endY])
rect.sort(key=lambda b: b[1])
print("After sorting")
print('\n')
# initially the line bottom is set to be the bottom of the first rect
line_bottom = rect[0][1]+rect[0][3]-1
line_begin_idx = 0
for i in range(len(rect)):
    # when a new box's top is below current line's bottom
    # it's a new line
    if rect[i][1] > line_bottom:
    # sort the previous line by their x
        rect[line_begin_idx:i] = sorted(rect[line_begin_idx:i], key=lambda 
        b: b[0])
        line_begin_idx = i
    # regardless if it's a new line or not
    # always update the line bottom
    line_bottom = max(rect[i][1]+rect[i][3]-1, line_bottom)
# sort the last line
rect[line_begin_idx:] = sorted(rect[line_begin_idx:], key=lambda b: b[0])
for i,(startX,startY, endX,endY) in enumerate(rect):
    roi = orig[startY:endY, startX:endX]   
    cv2.imwrite('gray/'+str(img_name)+'_'+str(i+1)+'.jpg',roi)

在这种情况下,具有检测到的文本的多边形边界框坐标为

146,36,354,34,354,82,146,84“澳大利亚”

273,78,434,151,411,201,250,129“收藏”

146,97,250,97,250,150,146,150“葡萄树”

77,166,131,126,154,158,99,197“旧”

242,215,361,241,354,273,235,248“山谷”

140,247,224,219,234,250,150,277“伊甸园”

194,298,306,296,307,324,194,325“设拉子”

232,406,363,402,364,421,233,426“老式”

152,402,216,405,215,425,151,422“ 2008”

124,470,209,480,207,500,122,490“南方”

227,481,387,472,389,494,228,503“澳大利亚”

222,562,312,564,311,585,222,583“吉布森”

“ by” 198,564,217,564,217,584,198,584

386,570,421,570,421,600,386,600“ 750毫升”

但是预期的输出是我需要按以下文本外观顺序对坐标进行排序。...澳大利亚->旧->藤->葡萄->收藏->伊甸园->谷->设拉子-> 2008->年份->南->澳大利亚-> by->吉布森-> 750ml。

0 个答案:

没有答案