Question

我使用了文本检测模型，该模型给出了边界框坐标。我已将多边形转换为矩形，以裁剪图像中的文本区域。结果边界框被打乱，我无法对其进行整理。根据我的理解，这些框是根据Y3排序的。但是，如果在同一行中有弯曲的文本（如下图所示），则顺序会被打乱，我需要先对其进行排序，然后再将其传递给文本提取模型。

Image with polygon coordinates

将多边形转换为矩形以裁剪文本区域

same image with converted rectangle bounding box

img_name='rre7'
orig=cv2.imread('CRAFT-pytorch/test/'+str(img_name)+'.jpg')
colnames=['x1','y1','x2','y2','x3','y3','x4','y4']
df=pd.read_csv('result/res_'+str(img_name)+'.txt',header=None, 
delimiter=',', names=colnames)
rect=[]
boxes=df.values
for i,(x1,y1,x2,y2,x3,y3,x4,y4) in enumerate(boxes):
    startX = min([x1,x2,x3,x4])
    startY = min([y1,y2,y3,y4])
    endX = max([x1,x2,x3,x4])
    endY = max([y1,y2,y3,y4])
    #print([startX,startY,endX,endY])
    rect.append([startX,startY,endX,endY])
rect.sort(key=lambda b: b[1])
print("After sorting")
print('\n')
# initially the line bottom is set to be the bottom of the first rect
line_bottom = rect[0][1]+rect[0][3]-1
line_begin_idx = 0
for i in range(len(rect)):
    # when a new box's top is below current line's bottom
    # it's a new line
    if rect[i][1] > line_bottom:
    # sort the previous line by their x
        rect[line_begin_idx:i] = sorted(rect[line_begin_idx:i], key=lambda 
        b: b[0])
        line_begin_idx = i
    # regardless if it's a new line or not
    # always update the line bottom
    line_bottom = max(rect[i][1]+rect[i][3]-1, line_bottom)
# sort the last line
rect[line_begin_idx:] = sorted(rect[line_begin_idx:], key=lambda b: b[0])
for i,(startX,startY, endX,endY) in enumerate(rect):
    roi = orig[startY:endY, startX:endX]   
    cv2.imwrite('gray/'+str(img_name)+'_'+str(i+1)+'.jpg',roi)

在这种情况下，具有检测到的文本的多边形边界框坐标为

146,36,354,34,354,82,146,84“澳大利亚”

273,78,434,151,411,201,250,129“收藏”

146,97,250,97,250,150,146,150“葡萄树”

77,166,131,126,154,158,99,197“旧”

242,215,361,241,354,273,235,248“山谷”

140,247,224,219,234,250,150,277“伊甸园”

194,298,306,296,307,324,194,325“设拉子”

232,406,363,402,364,421,233,426“老式”

152,402,216,405,215,425,151,422“ 2008”

124,470,209,480,207,500,122,490“南方”

227,481,387,472,389,494,228,503“澳大利亚”

222,562,312,564,311,585,222,583“吉布森”

“ by” 198,564,217,564,217,584,198,584

386,570,421,570,421,600,386,600“ 750毫升”

但是预期的输出是我需要按以下文本外观顺序对坐标进行排序。...澳大利亚->旧->藤->葡萄->收藏->伊甸园->谷->设拉子-> 2008->年份->南->澳大利亚-> by->吉布森-> 750ml。

根据图像中出现的顺序对检测到的文本边界框坐标进行排序

0 个答案: