Question

我对Google Cloud Vision API相当陌生，所以如果有明显的答案我会道歉。我注意到，对于某些图像，我在Google Cloud Vision API拖放（https://cloud.google.com/vision/docs/drag-and-drop）和python中的本地图像检测之间获得了不同的OCR结果。

我的代码如下

import io
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types

# Instantiates a client
client = vision.ImageAnnotatorClient()

# The name of the image file to annotate
file_name = "./test0004a.jpg"

# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

image = types.Image(content=content)

response = client.text_detection(image=image)
texts = response.text_annotations

print('Texts:')
for text in texts:
#    print('\n"{}"'.format(text.description.encode('utf-8')))
    print('\n"{}"'.format(text.description.encode('ascii','ignore')))

    vertices = (['({},{})'.format(vertex.x, vertex.y)
                for vertex in text.bounding_poly.vertices])

    print('bounds: {}'.format(','.join(vertices)))

突出显示此图片的示例图片已附加Sample Image

上面的python代码没有返回任何内容，但在浏览器中使用拖放功能正确识别＆＃34; 2340＆＃34;作为文本。不应该python和浏览器返回相同的结果吗？如果没有，为什么不呢？，我是否需要在代码中包含其他参数？

Answer 1

此处的问题是您使用的是TEXT_DETECTION而不是DOCUMENT_TEXT_DETECTION，这是您共享的Drag and Drop example page中使用的功能。

通过更改方法（到document_text_detection()），您应该获得所需的结果（我已经使用您的代码对其进行了测试，并且确实有效）：

# Using TEXT_DETECTION
response = client.text_detection(image=image)

# Using DOCUMENT_TEXT_DETECTION
response = client.document_text_detection(image=image)

尽管两种方法都可以用于OCR，如the documentation中所示，DOCUMENT_TEXT_DETECTION针对密集文本和文档进行了优化。您分享的图片质量不是很高，而且文字不清晰，因此对于此类图片，DOCUMENT_TEXT_DETECTION可能会提供比TEXT_DETECTION更好的效果。

查看DOCUMENT_TEXT_DETECTION比TEXT_DETECTION工作得更好的其他一些示例。在任何情况下，请注意，情况可能并非总是如此，TEXT_DETECTION在某些情况下可能仍会有更好的结果：

浏览器演示中的Google Cloud Vision OCR与python之间的差异

1 个答案: