将图片上传到Google云端硬盘以进行OCR

时间:2016-03-07 06:25:09

标签: php google-drive-api ocr

我尝试将图片上传到Google云端硬盘以进行光学字符识别(OCR)。这是我的代码:

require_once('vendor/autoload.php');

// Initialize Google Client
$client_email = 'xxxxxx@yyyyy.iam.gserviceaccount.com';
$private_key = file_get_contents('key.p12');
$scopes = array(
    'https://www.googleapis.com/auth/drive.file'
);
$credentials = new Google_Auth_AssertionCredentials(
    $client_email,
    $scopes,
    $private_key
);
$client = new Google_Client();
$client->setAssertionCredentials($credentials);
if ($client->getAuth()->isAccessTokenExpired()) {
  $client->getAuth()->refreshTokenWithAssertion();
}

// Initialize Google Drive service
$service = new Google_Service_Drive($client);

// Upload File
$file = new Google_Service_Drive_DriveFile();
$file->setName('Test Image for OCR');
$file->setDescription('Test Image for OCR');
$file->setMimeType('image/jpeg');
try {
  $data = file_get_contents($filename);
  $createdFile = $service->files->create($file, array(
      'data' => $data,
      'mimeType' => 'image/jpeg',
  ));
  var_dump($createdFile);
  // ===========
  // So, what's next?
  // ===========
} catch(Exception $e) {
  echo 'Error occurred: ' . $e->getMessage();
}

以上代码运行时没有错误,$createdFileGoogle_Service_Drive_DriveFile对象形式的有效资源。

问题:

  1. 我猜上传成功,因为create()函数没有返回错误。但是,我无法在我的Google云端硬盘中看到这些文件已上传。不应该将其上传到Google云端硬盘的根文件夹吗?

  2. 如何执行OCR?我可以从here读取有一个名为ocrLanguage的参数。我应该把它放在哪里,如何获得结果?

  3. 提前致谢。

    更新 var_dump()结果如下:

    object(Google_Service_Drive_DriveFile)#18 (55) {
      ["collection_key":protected]=>
      string(6) "spaces"
      ["internal_gapi_mappings":protected]=>
      array(0) {
      }
      ["appProperties"]=>
      NULL
      ["capabilitiesType":protected]=>
      string(42) "Google_Service_Drive_DriveFileCapabilities"
      ["capabilitiesDataType":protected]=>
      string(0) ""
      ["contentHintsType":protected]=>
      string(42) "Google_Service_Drive_DriveFileContentHints"
      ["contentHintsDataType":protected]=>
      string(0) ""
      ["createdTime"]=>
      NULL
      ["description"]=>
      NULL
      ["explicitlyTrashed"]=>
      NULL
      ["fileExtension"]=>
      NULL
      ["folderColorRgb"]=>
      NULL
      ["fullFileExtension"]=>
      NULL
      ["headRevisionId"]=>
      NULL
      ["iconLink"]=>
      NULL
      ["id"]=>
      string(28) "0B_XXXXX1yjq7dENaQWp4ckZoRk0"
      ["imageMediaMetadataType":protected]=>
      string(48) "Google_Service_Drive_DriveFileImageMediaMetadata"
      ["imageMediaMetadataDataType":protected]=>
      string(0) ""
      ["kind"]=>
      string(10) "drive#file"
      ["lastModifyingUserType":protected]=>
      string(25) "Google_Service_Drive_User"
      ["lastModifyingUserDataType":protected]=>
      string(0) ""
      ["md5Checksum"]=>
      NULL
      ["mimeType"]=>
      string(10) "image/jpeg"
      ["modifiedByMeTime"]=>
      NULL
      ["modifiedTime"]=>
      NULL
      ["name"]=>
      string(18) "Test Image for OCR"
      ["originalFilename"]=>
      NULL
      ["ownedByMe"]=>
      NULL
      ["ownersType":protected]=>
      string(25) "Google_Service_Drive_User"
      ["ownersDataType":protected]=>
      string(5) "array"
      ["parents"]=>
      NULL
      ["permissionsType":protected]=>
      string(31) "Google_Service_Drive_Permission"
      ["permissionsDataType":protected]=>
      string(5) "array"
      ["properties"]=>
      NULL
      ["quotaBytesUsed"]=>
      NULL
      ["shared"]=>
      NULL
      ["sharedWithMeTime"]=>
      NULL
      ["sharingUserType":protected]=>
      string(25) "Google_Service_Drive_User"
      ["sharingUserDataType":protected]=>
      string(0) ""
      ["size"]=>
      NULL
      ["spaces"]=>
      NULL
      ["starred"]=>
      NULL
      ["thumbnailLink"]=>
      NULL
      ["trashed"]=>
      NULL
      ["version"]=>
      NULL
      ["videoMediaMetadataType":protected]=>
      string(48) "Google_Service_Drive_DriveFileVideoMediaMetadata"
      ["videoMediaMetadataDataType":protected]=>
      string(0) ""
      ["viewedByMe"]=>
      NULL
      ["viewedByMeTime"]=>
      NULL
      ["viewersCanCopyContent"]=>
      NULL
      ["webContentLink"]=>
      NULL
      ["webViewLink"]=>
      NULL
      ["writersCanShare"]=>
      NULL
      ["modelData":protected]=>
      array(0) {
      }
      ["processed":protected]=>
      array(0) {
      }
    }
    

    该文件可以通过$service->files->get($file_id);获取,但在我的Google云端硬盘中不可见。返回的文件资源对象也不包含任何有用的内容。

3 个答案:

答案 0 :(得分:3)

我刚刚在V3中找到了通往OCR的方法。

  1. 上传图片
  2. 使用mimeType " application / vnd.google-apps.document"
  3. 将图像复制到Google文档
  4. 使用mimeType " text / plain"
  5. 将文档导出为纯文本

    P.S。似乎第2步赢得了" appDataFolder"。

    UserCredential credential = null;
    try
    {
        credential = await GoogleWebAuthorizationBroker.AuthorizeAsync(
            new Uri("ms-appx:///Assets/client_secret.json"), 
            new[] { DriveService.Scope.DriveFile }, "user", CancellationToken.None);
    }
    catch (AggregateException ex)
    {
        Debug.Write("Credential failed, " + ex.Message);
    }
    
    // Create Drive API service.
    var service = new DriveService(new BaseClientService.Initializer()
    {
        HttpClientInitializer = credential,
        ApplicationName = "TestApp",
    });
    
    // Create folder
    var folderMetadata = new Google.Apis.Drive.v3.Data.File();
    folderMetadata.Name = "NewFolder";
    folderMetadata.MimeType = "application/vnd.google-apps.folder";
    var request = service.Files.Create(folderMetadata);
    request.Fields = "id";
    var folder = request.Execute();
    Debug.WriteLine("Folder ID: " + folder.Id);
    
    // Upload the image file
    var fileMetadata = new Google.Apis.Drive.v3.Data.File();
    fileMetadata.Name = inputFile.Name;
    fileMetadata.Parents = new List<string> { folder.Id };
    FilesResource.CreateMediaUpload requestUpload;
    using (var stream = new System.IO.FileStream(inputFile.Path, System.IO.FileMode.Open))
    {
        requestUpload = service.Files.Create(fileMetadata, stream, "image/jpeg");
        requestUpload.Fields = "id";
        requestUpload.Upload();
    }
    var imgFile = requestUpload.ResponseBody;
    Debug.WriteLine("File ID: " + imgFile.Id);
    
    // Copy image and paste as document
    var textMetadata = new Google.Apis.Drive.v3.Data.File();
    textMetadata.Name = inputFile.Name;
    textMetadata.Parents = new List<string> { folderId };
    textMetadata.MimeType = "application/vnd.google-apps.document";
    FilesResource.CopyRequest requestCopy = service.Files.Copy(textMetadata, imgFile.Id);
    requestCopy.Fields = "id";
    requestCopy.OcrLanguage = "zh";
    var textFile = requestCopy.Execute();
    
    // Now we export document as plain text
    FilesResource.ExportRequest requestExport = service.Files.Export(textFile.Id, "text/plain");
    string output = requestExport.Execute();
    

答案 1 :(得分:1)

服务帐户不是你更像虚拟用户。它有自己的驱动器帐户。

如果您想上传到您的个人帐户。获取服务帐户电子邮件地址并将其分享到个人驱动器帐户中的目录。就像您希望与其他任何用户共享目录或文件一样。

然后你需要找出目录ID,我发现这样做的方法是让服务帐户执行files.list以获取它现在可以访问的所有内容的列表。找到目录ID或父ID后 您可以将上面的代码更改为

return_type func_name(type param_name1, type param_name2, …,type param_nameN);

我认为你使用V3 api的代码看起来我没有时间玩它。 'data' => $data, 'mimeType' => 'image/jpeg', 'parents' => 'the directory id' &lt; - 这是一个有根据的猜测。如果它不起作用,请告诉我,我会稍微了解一下如何将父母传递给v3。

选项nr 2:

另一个选项是服务帐户与您共享其文件夹,然后您将有权访问其驱动器帐户,并且您将能够在您的网络版Drive中看到该文件夹​​。再次搜索权限我认为你正在使用V3我还没有看过它。差异在于存储数据的位置以及存储计数的对象。

答案 2 :(得分:0)

请勿使用服务帐户。如果您要上传到自己的帐户,则只需为您的帐户获取适当的访问令牌即可。使用共享文件夹的中间帐户非常难看(imho)。