如何递归访问Google云端硬盘文件夹中的PDF文件并将其全部转换为Google文档?

时间:2017-06-08 07:12:51

标签: javascript google-apps-script google-drive-api

我在Google云端硬盘中分割了多个文件夹(树形结构)中的大量PDF文件。我希望保留PDF文件,同时还创建已经过OCRd的Google Docs格式的每个PDF文档的副本。 Google文档文件需要与PDF文件同名。

我该怎么做?

作为其中的一部分,我尝试至少将一个文件按代码转换为PDF,但也遇到问题。

function pdfToDoc() {  
 var fileBlob = DriveApp.getFileById('<ID>').getBlob();  
 var resource = {
   title: fileBlob.getName(),
   mimeType: fileBlob.getContentType()
 };
 var options = {
   ocr: true
 };
 var docFile = Drive.Files.insert(resource, fileBlob, options);   // <-- Google said "Empty response (line 10, file "Code")"
 Logger.log(docFile.alternateLink);  
}

1 个答案:

答案 0 :(得分:0)

我按照此tutorial进行了一些更改,因为我正在使用v3的Drive API。这是片段:

var blob = DriveApp.getFileById('FILE_ID').getBlob();
Logger.log(blob)
  var text = pdfToText(blob, {ocrLanguage: "en"});
  Logger.log(text);


/**
 * Convert pdf file (blob) to a text file on Drive, using built-in OCR.
 * By default, the text file will be placed in the root folder, with the same
 * name as source pdf (but extension 'txt'). Options:
 */

function pdfToText ( pdfFile, options ) {

  // Ensure Advanced Drive Service is enabled
  try {
    Drive.Files.list();
  }
  catch (e) {
    throw new Error( "Enable 'Drive API' in Resources - Advanced Google Services." );
  }


  // Prepare resource object for file creation
  var parents = [];
  var pdfName = "Sample Docs";
  Logger.log(pdfName)
  var resource = {
    name: pdfName,
    mimeType: MimeType.GOOGLE_DOCS,
    parents: parents
  };

  // Save PDF as GDOC
  resource.title = pdfName.replace(/pdf$/, '');
  var insertOpts = {
    'ocr': true,
    'ocrLanguage': 'en'
  }
  Logger.log(resource.title)
  var gdocFile = Drive.Files.create(resource, pdfFile, insertOpts);

  // Get text from GDOC  
  var gdocDoc = DocumentApp.openById(gdocFile.id);
  var text = gdocDoc.getBody().getText();

  // Save text file, if requested
  resource.name = pdfName.replace(/pdf$/, 'txt');
  resource.mimeType = MimeType.PLAIN_TEXT;

  var textBlob = Utilities.newBlob(text, MimeType.PLAIN_TEXT, resource.name);
  var textFile = Drive.Files.create(resource, textBlob);

  return text;
}

最初,DriveApp无法直接将pdf转换为Google文档,因此我使用了Advance Drive Service。只需点击how to enable advanced services上的此链接即可。

希望这有帮助。