Question

此问题已成功解决。我正在编辑我的帖子，以记录我的后代和未来参考的经验。

任务

我有117个PDF文件（平均大小~238 KB）上传到Google云端硬盘。我想将它们全部转换为Google文档，并将它们保存在不同的云端硬盘文件夹中。

问题

我尝试使用Drive.Files.insert转换文件。但是，在大多数情况下，在函数过早失效之前，只有5个文件可以通过这种方式进行转换

超出限制：DriveApp。（第＃行，文件“代码”）

其中上面引用的行是调用insert函数的时间。在第一次调用此函数后，后续调用通常会立即失败，而不会创建其他Google文档。

方法

我用3种主要方法来实现我的目标。一个人正在使用Drive.Files.insert，如上所述。其他两个使用Drive.Files.copy并发送batch of HTTP requests。最后两种方法由Tanaike提出，我建议阅读下面的答案以获取更多信息。 insert和copy函数来自Google Drive REST v2 API，而批处理多个HTTP请求来自Drive REST v3。

使用Drive.Files.insert，我遇到了处理执行限制的问题（在上面的问题部分中进行了解释）。一种解决方案是多次运行这些功能。为此，我需要一种方法来跟踪转换的文件。我有两种选择：使用电子表格和continuation token。因此，我有4种不同的测试方法：本段中提到的两个方法batching HTTP requests，并且调用Drive.Files.copy。

因为team drives behave differently from regular drives，我认为有必要两次尝试这些方法，其中一个包含PDF的文件夹是常规的非Team Drive文件夹，另一个文件夹在Team Drive下。总的来说，这意味着我有 8 不同的测试方法。

这些是我使用的确切功能。其中每个都使用了两次，唯一的变化是源文件夹和目标文件夹的ID（由于上述原因）：

方法A：使用Drive.Files.insert和电子表格

function toDocs() {
  var sheet = SpreadsheetApp.openById(/* spreadsheet id*/).getSheets()[0];
  var range = sheet.getRange("A2:E118");
  var table = range.getValues();
  var len = table.length;
  var resources = {
    title: null,
    mimeType: MimeType.GOOGLE_DOCS,
    parents: [{id: /* destination folder id */}]
  };
  var count = 0;
  var files = DriveApp.getFolderById(/* source folder id */).getFiles();
  while (files.hasNext()) {
    var blob = files.next().getBlob();
    var blobName = blob.getName();
    for (var i=0; i<len; i++) {
      if (table[i][0] === blobName.slice(5, 18)) {
        if (table[i][4])
          break;
        resources.title = blobName;
        Drive.Files.insert(resources, blob);  // Limit Exceeded: DriveApp. (line 51, file "Code")
        table[i][4] = "yes";
      }
    }

    if (++count === 10) {
      range.setValues(table);
      Logger.log("time's up");
    }
  }
}

方法B：使用Drive.Files.insert和continuation token

function toDocs() {
  var folder = DriveApp.getFolderById(/* source folder id */);
  var sprop = PropertiesService.getScriptProperties();
  var contToken = sprop.getProperty("contToken");
  var files = contToken ? DriveApp.continueFileIterator(contToken) : folder.getFiles();
  var options = {
    ocr: true
  };
  var resource = {
    title: null,
    mimeType: null,
    parents: [{id: /* destination folder id */}]
  };

  while (files.hasNext()) {
    var blob = files.next().getBlob();
    resource.title = blob.getName();
    resource.mimeType = blob.getContentType();
    Drive.Files.insert(resource, blob, options);  // Limit Exceeded: DriveApp. (line 113, file "Code")
    sprop.setProperty("contToken", files.getContinuationToken());
  }
}

方法C：使用Drive.Files.copy

此功能归功于Tanaike - 请参阅下面的答案以获取更多详细信息。

function toDocs() {
  var sourceFolderId = /* source folder id */;
  var destinationFolderId = /* destination folder id */;
  var files = DriveApp.getFolderById(sourceFolderId).getFiles();
  while (files.hasNext()) {
    var res = Drive.Files.copy({parents: [{id: destinationFolderId}]}, files.next().getId(), {convert: true, ocr: true});
    Logger.log(res) 
  }
}

方法D：发送batches of HTTP requests

此功能归功于Tanaike - 请参阅下面的答案以获取更多详细信息。

function toDocs() {
  var sourceFolderId = /* source folder id */;
  var destinationFolderId = /* destination folder id */;

  var files = DriveApp.getFolderById(sourceFolderId).getFiles();
  var rBody = [];
  while (files.hasNext()) {
    rBody.push({
      method: "POST",
      endpoint: "https://www.googleapis.com/drive/v3/files/" + files.next().getId() + "/copy",
      requestBody: {
        mimeType: "application/vnd.google-apps.document",
        parents: [destinationFolderId]
      }
    });
  }
  var cycle = 20; // Number of API calls at 1 batch request.
  for (var i = 0; i < Math.ceil(rBody.length / cycle); i++) {
    var offset = i * cycle;
    var body = rBody.slice(offset, offset + cycle);
    var boundary = "xxxxxxxxxx";
    var contentId = 0;
    var data = "--" + boundary + "\r\n";
    body.forEach(function(e){
      data += "Content-Type: application/http\r\n";
      data += "Content-ID: " + ++contentId + "\r\n\r\n";
      data += e.method + " " + e.endpoint + "\r\n";
      data += e.requestBody ? "Content-Type: application/json; charset=utf-8\r\n\r\n" : "\r\n";
      data += e.requestBody ? JSON.stringify(e.requestBody) + "\r\n" : "";
      data += "--" + boundary + "\r\n";
    });
    var options = {
      method: "post",
      contentType: "multipart/mixed; boundary=" + boundary,
      payload: Utilities.newBlob(data).getBytes(),
      headers: {'Authorization': 'Bearer ' + ScriptApp.getOAuthToken()},
      muteHttpExceptions: true,
    };
    var res = UrlFetchApp.fetch("https://www.googleapis.com/batch", options).getContentText();
//    Logger.log(res); // If you use this, please remove the comment.
  }
}

什么工作，什么没有

使用Drive.Files.insert的所有功能均无效。一切使用insert进行转换的功能因此错误而失败

超出限制：DriveApp。（第＃行，文件“代码”）

（行号替换为通用符号）。没有进一步的细节或可以找到错误的描述。一个显着的变化是一个我在其中使用了电子表格，PDF是在团队驱动器中夹;所有其他方法在没有转换的情况下立即失败单个文件，这个在失败之前转换为5。但是，什么时候考虑到为什么这种变化比其他变化更好，我认为与使用特定事物有关的任何理由都比侥幸更多资源（电子表格，团队驱动等）
仅使用Drive.Files.copy和batch HTTP requests 当源文件夹是个人（非团队驱动器）文件夹时。
从团队驱动器中读取时尝试使用copy功能文件夹因此错误而失败：

找不到档案：1RAGxe9a_-euRpWm3ePrbaGaX5brpmGXu（行＃，档案“代码”）

（行号替换为通用符号）。被引用的行是
```
var res = Drive.Files.copy({parents: [{id: destinationFolderId}]}, files.next().getId(), {convert: true, ocr: true});
```
从Team Drive文件夹中读取时使用batch HTTP requests 什么都不做 - 没有创建doc文件，也没有抛出任何错误。功能在没有完成任何任何事情的情况下无声地终止。

结论

如果您希望将大量PDF转换为Google文档或文本文件，请使用Drive.Files.copy或send batches of HTTP requests并确保PDF存储在个人驱动器而非Team Drive中

特别感谢@tehhowch对我的问题表现出如此狂热的兴趣并反复回来提供反馈，并感谢@Tanaike提供代码以及成功解决我的问题的解释（需要注意，请阅读以上详细信息）。

Answer 1

您想要将文件夹中的PDF文件转换为Google文档。 PDF文件位于团队驱动器的文件夹中。您想导入将它们转换为Google云端硬盘的文件夹。如果我的理解是正确的，那么这个方法怎么样？

对于从PDF到Google文档的转换，它不仅可以使用Drive.Files.insert()转换，还可以使用Drive.Files.copy()进行转换。使用Drive.Files.copy()的优点是

虽然Drive.Files.insert()的大小限制为5 MB，但Drive.Files.copy()的大小可能超过5 MB。
在我的环境中，处理速度快于Drive.Files.insert()。

对于这种方法，我想提出以下两种模式。

模式1：使用Drive API v2

在这种情况下，Advanced Google Services的Drive API v2用于转换文件。

function myFunction() {
  var sourceFolderId = "/* source folder id */";
  var destinationFolderId = "/* dest folder id */";
  var files = DriveApp.getFolderById(sourceFolderId).getFiles();
  while (files.hasNext()) {
    var res = Drive.Files.copy({parents: [{id: destinationFolderId}]}, files.next().getId(), {convert: true, ocr: true});
//    Logger.log(res) // If you use this, please remove the comment.
  }
}

模式2：使用Drive API v3

在这种情况下，Drive API v3用于转换文件。在这里，我使用了批处理请求。因为批处理请求可以通过一次API调用使用100个API调用。这样就可以消除API配额问题。

function myFunction() {
  var sourceFolderId = "/* source folder id */";
  var destinationFolderId = "/* dest folder id */";

  var files = DriveApp.getFolderById(sourceFolderId).getFiles();
  var rBody = [];
  while (files.hasNext()) {
    rBody.push({
      method: "POST",
      endpoint: "https://www.googleapis.com/drive/v3/files/" + files.next().getId() + "/copy",
      requestBody: {
        mimeType: "application/vnd.google-apps.document",
        parents: [destinationFolderId]
      }
    });
  }
  var cycle = 100; // Number of API calls at 1 batch request.
  for (var i = 0; i < Math.ceil(rBody.length / cycle); i++) {
    var offset = i * cycle;
    var body = rBody.slice(offset, offset + cycle);
    var boundary = "xxxxxxxxxx";
    var contentId = 0;
    var data = "--" + boundary + "\r\n";
    body.forEach(function(e){
      data += "Content-Type: application/http\r\n";
      data += "Content-ID: " + ++contentId + "\r\n\r\n";
      data += e.method + " " + e.endpoint + "\r\n";
      data += e.requestBody ? "Content-Type: application/json; charset=utf-8\r\n\r\n" : "\r\n";
      data += e.requestBody ? JSON.stringify(e.requestBody) + "\r\n" : "";
      data += "--" + boundary + "\r\n";
    });
    var options = {
      method: "post",
      contentType: "multipart/mixed; boundary=" + boundary,
      payload: Utilities.newBlob(data).getBytes(),
      headers: {'Authorization': 'Bearer ' + ScriptApp.getOAuthToken()},
      muteHttpExceptions: true,
    };
    var res = UrlFetchApp.fetch("https://www.googleapis.com/batch", options).getContentText();
//    Logger.log(res); // If you use this, please remove the comment.
  }
}

注意：

如果1次批量请求的API调用数量很大（当前值为100），请修改var cycle = 100。
如果Drive API v3无法用于团队驱动，请告诉我。我可以将其转换为Drive API v2。
如果团队驱动是您的情况问题的原因，您可以在将PDF文件复制到Google云端硬盘后尝试此操作吗？

参考：

Batching Requests

如果这些对你没用，我很抱歉。

Answer 2

您可以首先在Google工作表中获取并存储所有文件的ID。然后，您可以使用它的ID继续正常处理每个文件。然后在处理完后将该文件标记为已处理。在处理文件之前，请检查该文件是否已被处理。

如果有多个文件，那么您也可以将行号存储到您处理的位置，然后在此之后继续。

然后最后创建一个触发器，每10分钟左右执行一次。

通过这种方式，您可以克服单次执行的执行时间限制。 API请求配额和所有内容都不会被此方法绕过。

使用Drive API / DriveApp将PDF转换为Google文档

任务

问题

方法