我正在使用GCP愿景document_text_detection从收据中提取内容。输出不是结构化格式。
我尝试通过使用行和单词的宽度/高度来获取结构化格式,但是格式不是我期望的
response = client.document_text_detection(image=image)
items = []
lines = {}
for text in response.text_annotations[1:]:
top_x_axis = text.bounding_poly.vertices[0].x
top_y_axis = text.bounding_poly.vertices[0].y
bottom_y_axis = text.bounding_poly.vertices[3].y
if top_y_axis not in lines:
lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]
for s_top_y_axis, s_item in lines.items():
if top_y_axis < s_item[0][1]:
lines[s_top_y_axis][1].append((top_x_axis, text.description))
break
for _, item in lines.items():
if item[1]:
words = sorted(item[1], key=lambda t: t[0])
items.append((item[0], ' '.join([word for _, word in words]), words))
alllines=[]
for i in range(len(items)):
alllines.append(items[i][1])
print(alllines)
我希望
['DOLLAR TREE (828) 883-2495',
'#3191',
'108 Store Chestnut Street',
'Suite 2',
'Brevard NC 28712-3775',
'DESCRIPTION QTY PRICE TOTAL',
'GOURMET MILKSHAKE MIXED NUT 1.00 1.00T',
'ENERGY DRINKS 1.00 1.00T',
'GATORADE 1.00 1.00T',
'CHDR POT SKINS 1.00 1.00T',
'Sub Total $ 5.00',
'FOOD TAX $ 0.06',
'SALES TAX $ 0.14',
'$ 5.20']
但实际输出是
['DOLLAR TREE ( 828 ) 883 - 2495',
'# 3191',
'108 Store Chestnut Street',
'Suite 2',
'Brevard NC 28712 - 3775',
'DESCRIPTION QTY PRICE TOTAL',
'GOURMET MILKSHAKE MIXED NUT 1 1 . . 00 00 1 1 . . 00T 00T',
'ENERGY DRINKS 1 . 00 1 . 00T',
'GATORADE 1 . 00 1 . 00T',
'CHDR POT SKINS 1 . 00 1 . 00T',
'Sub Total $ 5 . 00',
'FOOD TAX $ 0 . 06',
'SALES TAX $ 0 . 14',
'$ 5 . 20']