我正在尝试通过Google Vision API重现“文档文本检测”示例UI上传器的输出。但是,当我要求将单词组合在一起时,我从sample code获得的输出仅提供单个字符作为输出。
库中是否有一个功能允许通过“单词”进行分组,而不是来自DOCUMENT_TEXT_DETECT端点或Python中的image.detect_full_text()
函数?
我不是在寻找全文提取,因为我的.jpg文件没有以image.detect_text()
函数满足的方式进行视觉结构化。
Google的示例代码:
def detect_document(path):
"""Detects document features in an image."""
vision_client = vision.Client()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision_client.image(content=content)
document = image.detect_full_text()
for page in document.pages:
for block in page.blocks:
block_words = []
for paragraph in block.paragraphs:
block_words.extend(paragraph.words)
block_symbols = []
for word in block_words:
block_symbols.extend(word.symbols)
block_text = ''
for symbol in block_symbols:
block_text = block_text + symbol.text
print('Block Content: {}'.format(block_text))
print('Block Bounds:\n {}'.format(block.bounding_box))
Google提供的现成样本的示例输出:
property {
detected_languages {
language_code: "mt"
}
}
bounding_box {
vertices {
x: 1193
y: 1664
}
vertices {
x: 1206
y: 1664
}
vertices {
x: 1206
y: 1673
}
vertices {
x: 1193
y: 1673
}
}
symbols {
property {
detected_languages {
language_code: "en"
}
}
bounding_box {
vertices {
x: 1193
y: 1664
}
vertices {
x: 1198
y: 1664
}
vertices {
x: 1198
y: 1673
}
vertices {
x: 1193
y: 1673
}
}
text: "P"
}
symbols {
property {
detected_languages {
language_code: "en"
}
detected_break {
type: LINE_BREAK
}
}
bounding_box {
vertices {
x: 1200
y: 1664
}
vertices {
x: 1206
y: 1664
}
vertices {
x: 1206
y: 1673
}
vertices {
x: 1200
y: 1673
}
}
text: "M"
}
block_words
Out[47]:
[property {
detected_languages {
language_code: "en"
}
}
bounding_box {
vertices {
x: 1166
y: 1664
}
vertices {
x: 1168
y: 1664
}
vertices {
x: 1168
y: 1673
}
vertices {
x: 1166
y: 1673
}
}
symbols {
property {
detected_languages {
language_code: "en"
}
}
bounding_box {
vertices {
x: 1166
y: 1664
}
vertices {
x: 1168
y: 1664
}
vertices {
x: 1168
y: 1673
}
vertices {
x: 1166
y: 1673
}
}
text: "2"
}
答案 0 :(得分:1)
这种反应迟到了。我想你正在寻找类似下面的东西。
def parse_image(image_path=None):
"""
Parse the image using Google Cloud Vision API, Detects "document" features in an image
:param image_path: path of the image
:return: text content
:rtype: str
"""
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=open(image_path, 'rb'))
text = response.text_annotations
del response
return text[0].description
该函数返回图像中的完整文本。
答案 1 :(得分:0)
GCV有两种类型: 1.文本检测和2.文档文本检测
文本检测用于检测图像中的某些文本。基本上它给出了在其中找到的文本值。您不能依赖其准确性,例如,这不能用于阅读收据或任何文档数据。
然而,文档文本检测的准确性非常好,并且可以从文档中检测每个细节。在这种方法中,单词彼此分开,例如, 03/12/2017将与其坐标一起为0 3/1 2 /等。这实际上是为了更好的准确性。
现在根据您的问题,您应该更好地使用第一种方法,即文本检测,它将为您提供完整单词及其坐标的结果。