我使用了文本检测模型,该模型给出了边界框坐标。我已将多边形转换为矩形,以裁剪图像中的文本区域。结果边界框被打乱,我无法对其进行整理。根据我的理解,这些框是根据Y3排序的。但是,如果在同一行中有弯曲的文本(如下图所示),则顺序会被打乱,我需要先对其进行排序,然后再将其传递给文本提取模型。
Image with polygon coordinates
将多边形转换为矩形以裁剪文本区域
same image with converted rectangle bounding box
img_name='rre7'
orig=cv2.imread('CRAFT-pytorch/test/'+str(img_name)+'.jpg')
colnames=['x1','y1','x2','y2','x3','y3','x4','y4']
df=pd.read_csv('result/res_'+str(img_name)+'.txt',header=None,
delimiter=',', names=colnames)
rect=[]
boxes=df.values
for i,(x1,y1,x2,y2,x3,y3,x4,y4) in enumerate(boxes):
startX = min([x1,x2,x3,x4])
startY = min([y1,y2,y3,y4])
endX = max([x1,x2,x3,x4])
endY = max([y1,y2,y3,y4])
#print([startX,startY,endX,endY])
rect.append([startX,startY,endX,endY])
rect.sort(key=lambda b: b[1])
print("After sorting")
print('\n')
# initially the line bottom is set to be the bottom of the first rect
line_bottom = rect[0][1]+rect[0][3]-1
line_begin_idx = 0
for i in range(len(rect)):
# when a new box's top is below current line's bottom
# it's a new line
if rect[i][1] > line_bottom:
# sort the previous line by their x
rect[line_begin_idx:i] = sorted(rect[line_begin_idx:i], key=lambda
b: b[0])
line_begin_idx = i
# regardless if it's a new line or not
# always update the line bottom
line_bottom = max(rect[i][1]+rect[i][3]-1, line_bottom)
# sort the last line
rect[line_begin_idx:] = sorted(rect[line_begin_idx:], key=lambda b: b[0])
for i,(startX,startY, endX,endY) in enumerate(rect):
roi = orig[startY:endY, startX:endX]
cv2.imwrite('gray/'+str(img_name)+'_'+str(i+1)+'.jpg',roi)
在这种情况下,具有检测到的文本的多边形边界框坐标为
146,36,354,34,354,82,146,84“澳大利亚”
273,78,434,151,411,201,250,129“收藏”
146,97,250,97,250,150,146,150“葡萄树”
77,166,131,126,154,158,99,197“旧”
242,215,361,241,354,273,235,248“山谷”
140,247,224,219,234,250,150,277“伊甸园”
194,298,306,296,307,324,194,325“设拉子”
232,406,363,402,364,421,233,426“老式”
152,402,216,405,215,425,151,422“ 2008”
124,470,209,480,207,500,122,490“南方”
227,481,387,472,389,494,228,503“澳大利亚”
222,562,312,564,311,585,222,583“吉布森”
“ by” 198,564,217,564,217,584,198,584
386,570,421,570,421,600,386,600“ 750毫升”
但是预期的输出是我需要按以下文本外观顺序对坐标进行排序。...澳大利亚->旧->藤->葡萄->收藏->伊甸园->谷->设拉子-> 2008->年份->南->澳大利亚-> by->吉布森-> 750ml。