Question

我正在使用GCP愿景document_text_detection从收据中提取内容。输出不是结构化格式。

我尝试通过使用行和单词的宽度/高度来获取结构化格式，但是格式不是我期望的

response = client.document_text_detection(image=image)
items = []
lines = {}

for text in response.text_annotations[1:]:
    top_x_axis = text.bounding_poly.vertices[0].x
    top_y_axis = text.bounding_poly.vertices[0].y
    bottom_y_axis = text.bounding_poly.vertices[3].y

    if top_y_axis not in lines:
        lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]

    for s_top_y_axis, s_item in lines.items():
        if top_y_axis < s_item[0][1]:
            lines[s_top_y_axis][1].append((top_x_axis, text.description))
            break

for _, item in lines.items():
    if item[1]:
        words = sorted(item[1], key=lambda t: t[0])
        items.append((item[0], ' '.join([word for _, word in words]), words))

alllines=[]
for i in range(len(items)):
    alllines.append(items[i][1])

print(alllines)

我希望

['DOLLAR TREE (828) 883-2495',
'#3191',
'108 Store Chestnut Street',
'Suite 2',
'Brevard NC 28712-3775',
'DESCRIPTION QTY PRICE TOTAL',
'GOURMET MILKSHAKE MIXED NUT 1.00 1.00T',
'ENERGY DRINKS 1.00 1.00T',
'GATORADE 1.00 1.00T',
'CHDR POT SKINS 1.00 1.00T',
'Sub Total $ 5.00',
'FOOD TAX $ 0.06',
'SALES TAX $ 0.14',
'$ 5.20']

但实际输出是

['DOLLAR TREE ( 828 ) 883 - 2495',
 '# 3191',
 '108 Store Chestnut Street',
 'Suite 2',
 'Brevard NC 28712 - 3775',
 'DESCRIPTION QTY PRICE TOTAL',
 'GOURMET MILKSHAKE MIXED NUT 1 1 . . 00 00 1 1 . . 00T 00T',
 'ENERGY DRINKS 1 . 00 1 . 00T',
 'GATORADE 1 . 00 1 . 00T',
 'CHDR POT SKINS 1 . 00 1 . 00T',
 'Sub Total $ 5 . 00',
 'FOOD TAX $ 0 . 06',
 'SALES TAX $ 0 . 14',
 '$ 5 . 20']

Google Vision document_text RTF格式问题

0 个答案: