我有以下实现将pdf文件上传到google docs(取自gdata API示例):
def UploadResourceSample():
"""Upload a document, and convert to Google Docs."""
client = CreateClient()
doc = gdata.docs.data.Resource(type='document', title='My Sample Doc')
# This is a convenient MS Word doc that we know exists
path = _GetDataFilePath('test.0.doc')
print 'Selected file at: %s' % path
# Create a MediaSource, pointing to the file
media = gdata.data.MediaSource()
media.SetFileHandle(path, 'application/msword')
# Pass the MediaSource when creating the new Resource
doc = client.CreateResource(doc, media=media)
print 'Created, and uploaded:', doc.title.text, doc.resource_id.text
现在我想在上传的文件上执行OCR文本识别。但我不确定如何在gdata docs python API中启用OCR识别。所以我的问题是: 有没有办法在pdf文件上使用gdata python v3.0 API启用OCR识别?
答案 0 :(得分:3)
我已设法使用以下代码获取我的pdf文档OCR'
def UploadResourceSample(filename, filepath, fullpath):
"""Upload a document, and convert to Google Docs."""
client = CreateClient()
doc = gdata.docs.data.Resource(type='document', title=filename)
path = fullpath
print 'Selected file at: %s' % path
# Create a MediaSource, pointing to the file
media = gdata.data.MediaSource()
media.SetFileHandle(path, 'application/pdf')
# Pass the MediaSource when creating the new Resource
create_uri = gdata.docs.client.RESOURCE_UPLOAD_URI + '?ocr=true&ocr-language=de'
doc = client.CreateResource(doc, create_uri=create_uri, media=media)
print 'Created, and uploaded:', doc.title.text, doc.resource_id.text