我在Google云端硬盘中分割了多个文件夹(树形结构)中的大量PDF文件。我希望保留PDF文件,同时还创建已经过OCRd的Google Docs格式的每个PDF文档的副本。 Google文档文件需要与PDF文件同名。
我该怎么做?
作为其中的一部分,我尝试至少将一个文件按代码转换为PDF,但也遇到问题。
function pdfToDoc() {
var fileBlob = DriveApp.getFileById('<ID>').getBlob();
var resource = {
title: fileBlob.getName(),
mimeType: fileBlob.getContentType()
};
var options = {
ocr: true
};
var docFile = Drive.Files.insert(resource, fileBlob, options); // <-- Google said "Empty response (line 10, file "Code")"
Logger.log(docFile.alternateLink);
}
答案 0 :(得分:0)
我按照此tutorial进行了一些更改,因为我正在使用v3的Drive API。这是片段:
var blob = DriveApp.getFileById('FILE_ID').getBlob();
Logger.log(blob)
var text = pdfToText(blob, {ocrLanguage: "en"});
Logger.log(text);
/**
* Convert pdf file (blob) to a text file on Drive, using built-in OCR.
* By default, the text file will be placed in the root folder, with the same
* name as source pdf (but extension 'txt'). Options:
*/
function pdfToText ( pdfFile, options ) {
// Ensure Advanced Drive Service is enabled
try {
Drive.Files.list();
}
catch (e) {
throw new Error( "Enable 'Drive API' in Resources - Advanced Google Services." );
}
// Prepare resource object for file creation
var parents = [];
var pdfName = "Sample Docs";
Logger.log(pdfName)
var resource = {
name: pdfName,
mimeType: MimeType.GOOGLE_DOCS,
parents: parents
};
// Save PDF as GDOC
resource.title = pdfName.replace(/pdf$/, '');
var insertOpts = {
'ocr': true,
'ocrLanguage': 'en'
}
Logger.log(resource.title)
var gdocFile = Drive.Files.create(resource, pdfFile, insertOpts);
// Get text from GDOC
var gdocDoc = DocumentApp.openById(gdocFile.id);
var text = gdocDoc.getBody().getText();
// Save text file, if requested
resource.name = pdfName.replace(/pdf$/, 'txt');
resource.mimeType = MimeType.PLAIN_TEXT;
var textBlob = Utilities.newBlob(text, MimeType.PLAIN_TEXT, resource.name);
var textFile = Drive.Files.create(resource, textBlob);
return text;
}
最初,DriveApp无法直接将pdf转换为Google文档,因此我使用了Advance Drive Service。只需点击how to enable advanced services上的此链接即可。
希望这有帮助。