如何使用PHP和aws-sdk v3将大型档案上传到Amazon Glacier?

时间:2015-10-16 08:31:21

标签: php amazon-web-services cpanel aws-sdk amazon-glacier

这是我第一次使用亚马逊的任何东西。我正在尝试使用PHP SDK V3将多个文件上传到Amazon Glacier。然后,这些文件需要由亚马逊合并为一个。

文件存储在cPanel的主目录中,必须通过cron作业上传到Amazon Glacier。

我知道我必须使用上传多部分方法,但我不确定它需要哪些其他功能才能使其工作。我也不确定我计算和传递变量的方式是否正确。

这是我到目前为止的代码:

<?php
require 'aws-autoloader.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;

//############################################
//DEFAULT VARIABLES
//############################################
$key = 'XXXXXXXXXXXXXXXXXXXX';
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';   
$accountId = '123456789123';
$vaultName = 'VaultName';
$partSize = '4194304';
$fileLocation = 'path/to/files/';

//############################################
//DECLARE THE AMAZON CLIENT
//############################################
$client = new GlacierClient([
    'region' => 'us-west-2',
    'version' => '2012-06-01',
    'credentials' => array(
        'key'    => $key,
        'secret' => $secret,
  )
]);

//############################################
//GET THE UPLOAD ID
//############################################
$result = $client->initiateMultipartUpload([
    'partSize' => $partSize,
    'vaultName' => $vaultName
]);
$uploadId = $result['uploadId'];

//############################################
//GET ALL FILES INTO AN ARRAY
//############################################
$files = scandir($fileLocation);
unset($files[0]);
unset($files[1]);
sort($files);

//############################################
//GET SHA256 TREE HASH (CHECKSUM)
//############################################
$th = new TreeHash();
//GET TOTAL FILE SIZE
foreach($files as $part){
    $filesize = filesize($fileLocation.$part);
    $total = $filesize;
    $th = $th->update(file_get_contents($fileLocation.$part));
}
$totalchecksum = $th->complete();

//############################################
//UPLOAD FILES
//############################################
foreach ($files as $key => $part) {
    //HASH CONTENT
    $filesize = filesize($fileLocation.$part);
    $rangeSize = $filesize-1;
    $range = 'bytes 0-'.$rangeSize.'/*';
    $sourcefile = $fileLocation.$part;

    $result = $client->uploadMultipartPart([
        'accountId' => $accountId,
        'checksum' => '',
        'range' => $range,
        'sourceFile' => $sourcefile,
        'uploadId' => $uploadId,
        'vaultName' => $vaultName
    ]);
}

//############################################
//COMPLETE MULTIPART UPLOAD
//############################################
$result = $client->completeMultipartUpload([
    'accountId' => $accountId,
    'archiveSize' => $total,
    'checksum' => $totalchecksum,
    'uploadId' => $uploadId,
    'vaultName' => $vaultName,
]);
?>

似乎新Glacier客户端的声明正在工作,我确实收到了一个上传ID,但如果我做得对,我不是100%。文件需要上传到然后合并的Amazon Glacier Vault仍然是空的,我不确定文件是否只显示completeMultipartUpload已成功执行的文件。

运行代码时我也收到以下错误:

  

致命错误:未捕获的异常   'Aws \ Glacier \ Exception \ GlacierException',消息'执行错误   “CompleteMultipartUpload”上   “https://glacier.us-west-2.amazonaws.com/XXXXXXXXXXXX/vaults/XXXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M”;   AWS HTTP错误:客户端错误:403 InvalidSignatureException(client):   我们计算的请求签名与您的签名不匹配   提供。检查您的AWS Secret Access Key和签名方法。请教   服务文档以获取详细信息。这个Canonical字符串   请求应该是'POST   / XXXXXXXXXXX /金库/ XXXXXXXXX /多部分的上载/ cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M   主持人:glacier.us-west-2.amazonaws.com x-amz-archive-size:1501297   x-amz-date:20151016T081455Z x-amz-glacier-version:2012-06-01   x-amz-sha256-tree-hash:?[qiuã°²åÁ¹ý+¤Üª¤[; K×T   主机; x-amz-archive-size; x-amz-date; x-amz-glacier-version; x-am in   /home/XXXXXXXXXXXX/public_html/XXXXXXXXXXX/Aws/WrappedHttpHandler.php   在第152行

是否有更简单的方法可以做到这一点?如果有帮助的话,我也有完整的SSH访问权限。

3 个答案:

答案 0 :(得分:1)

我认为你误解了uploadMultipartPart。 uploadMultipartPart表示您在多个部分上传1个大文件。 然后执行completeMultipartUpload以标记您已完成上传一个文件。

从您的代码

看起来您正在上传多个文件。

您可能实际上不需要使用uploadMultipartPart

也许您可以使用常规的“uploadArchive”?

REF:

https://blogs.aws.amazon.com/php/post/Tx7PFHT4OJRJ42/Uploading-Archives-to-Amazon-Glacier-from-PHP

答案 1 :(得分:1)

我已经在PHP SDK V3(版本3)中管理了这个,我在研究中不断发现这个问题,所以我想我也会发布我的解决方案。使用风险自负,检查或处理错误很少。

<?php
require 'vendor/autoload.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;


// Create the glacier client to connect with
$glacier = new GlacierClient(array(
      'profile' => 'default',
      'region' => 'us-east-1',
      'version' => '2012-06-01'
      ));

$fileName = '17mb_test_file';         // this is the file to upload
$chunkSize = 1024 * 1024 * pow(2,2);  // 1 MB times a power of 2
$fileSize = filesize($fileName);      // we will need the file size (in bytes)

// initiate the multipart upload
// it is dangerous to send the filename without escaping it first
$result = $glacier->initiateMultipartUpload(array(
      'archiveDescription' => 'A multipart-upload for file: '.$fileName,
      'partSize' => $chunkSize,
      'vaultName' => 'MyVault'
      ));

// we need the upload ID when uploading the parts
$uploadId = $result['uploadId'];

// we need to generate the SHA256 tree hash
// open the file so we can get a hash from its contents
$fp = fopen($fileName, 'r');
// This class can generate the hash
$th = new TreeHash();
// feed in all of the data
$th->update(fread($fp, $fileSize));
// generate the hash (this comes out as binary data)...
$hash = $th->complete();
// but the API needs hex (thanks). PHP to the rescue!
$hash = bin2hex($hash);

// reset the file position indicator
fseek($fp, 0);

// the part counter
$partNumber = 0;

print("Uploading: '".$fileName
    ."' (".$fileSize." bytes) in "
    .(ceil($fileSize/$chunkSize))." parts...\n");
while ($partNumber * $chunkSize < ($fileSize + 1))
{
  // while we haven't written everything out yet
  // figure out the offset for the first and last byte of this chunk
  $firstByte = $partNumber * $chunkSize;
  // the last byte for this piece is either the last byte in this chunk, or
  // the end of the file, whichever is less
  // (watch for those Obi-Wan errors)
  $lastByte = min((($partNumber + 1) * $chunkSize) - 1, $fileSize - 1);

  // upload the next piece
  $result = $glacier->uploadMultipartPart(array(
        'body' => fread($fp, $chunkSize),  // read the next chunk
        'uploadId' => $uploadId,          // the multipart upload this is for
        'vaultName' => 'MyVault',
        'range' => 'bytes '.$firstByte.'-'.$lastByte.'/*' // weird string
        ));

  // this is where one would check the results for error.
  // This is left as an exercise for the reader ;)

  // onto the next piece
  $partNumber++;
  print("\tpart ".$partNumber." uploaded...\n");
}
print("...done\n");

// and now we can close off this upload
$result = $glacier->completeMultipartUpload(array(
  'archiveSize' => $fileSize,         // the total file size
  'uploadId' => $uploadId,            // the upload id
  'vaultName' => 'MyVault',
  'checksum' => $hash                 // here is where we need the tree hash
));

// this is where one would check the results for error.
// This is left as an exercise for the reader ;)


// get the archive id.
// You will need this to refer to this upload in the future.
$archiveId = $result->get('archiveId');

print("The archive Id is: ".$archiveId."\n");


?>

答案 2 :(得分:0)

注意:使用aws-sdk-php v2上传多部分的解决方案。我认为它可以在v3上运行,而对类的使用只需很少的更改 TreeHash

由于snippet of Neil Vandermeiden,我已经完成了相同的任务,但有一点改进。

Neil仅对整个文件进行校验和验证。它有两个可能的问题:

  • 这可能会占用大量内存:请记住,我们正在上传一个大文件;对它进行哈希处理以获取校验和,需要打开它并读取其所有内容。
  • 我们正在上载多个文件部分:上载某些部分时可能会遇到问题,以aws上损坏的文件部分结尾。如果我们计算并验证每个部分的每个校验和,就可以防止出现问题。

在下面的代码中,我们计算发送到aws的每个文件部分的校验和,然后将每个文件和相关的文件部分发送到aws api。

aws完成接收上传的零件后,将对其执行校验和。如果校验和与我们的不匹配,则会引发异常。如果成功,则确保该部分已成功上传。

<?php
use Aws\Common\Hash\TreeHash;
use Aws\Glacier\GlacierClient;

/**
 * upload a file and store it into aws glacier
 */
class UploadMultipartFileToGlacier
{
    // aws glacier
    private $description;
    private $glacierClient;
    private $glacierConfig;
    /*
     * it's a requirement the part size beingto be (1024 KB * 1024 KB) multiplied by any power of 2 (1MB, 2MB, 4MB, 8MB, and so on)
     * reference: https://docs.aws.amazon.com/aws-sdk-php/v2/api/class-Aws.Glacier.GlacierClient.html#_initiateMultipartUpload
     **/
    private $partSize;

    // file location
    private $filePath;

    private $errorMessage;
    private $executionDate;

    public function __construct($filePath)
    {
        $this->executionDate = date('Y-m-d H:i:s');
        $this->filePath = $filePath;
    
        // AWS Glacier
        $this->glacierConfig = (object) [
            'vaultId' => 'VAULT_NAME',
            'region' => 'REGION',
            'accessKeyId' => 'ACCESS_KEY',
            'secretAccessKey' => 'SECRET_KEY',
        ];

        $this->glacierClient = GlacierClient::factory(array(
            'credentials' => array(
                'key'    => $this->glacierConfig->accessKeyId,
                'secret' => $this->glacierConfig->secretAccessKey,
            ),
            'region' => $this->glacierConfig->region
        ));

        $this->description = sprintf('Upload file %s at %s', $this->filePath, $this->executionDate);

        $this->partSize = 1024 * 1024 * pow(2, 2); // 4 MB
    }

    public function upload()
    {
        list($success, $data) = $this->uploadFileToGlacier();

        if ($success) {
            // todo: tasks to do when file has upload successfuly to aws glacier
        } else {
            // todo: handle error
            // $this->errorMessage contains the exception message
        }
    }

    private function completeMultipartUpload($uploadId, $fileSize, $checksumParts)
    {
        // with all the chechsums of the processed file parts, we can compute the file checksum. It's important to send it as a parameter to the
        // aws api's GlacierClient::completeMultipartUpload. Aws compute on their side the checksum of the uploaded part. If
        // their checksum doesn't match ours, the api throws an exception.
        $checksum = $this->getChecksumFile($checksumParts);

        return $this->glacierClient->completeMultipartUpload([
            'archiveSize' => $fileSize,
            'uploadId' => $uploadId,
            'vaultName' => $this->glacierConfig->vaultId,
            'checksum' => $checksum
        ]);
    }

    private function getChecksumPart($content)
    {
        $treeHash = new TreeHash();
        $mb = 1024 * 1024 * pow(2, 0); // 1 MB (the class TreeHash only allows to process chunks <= 1 MB)
        $buffer = $content;

        while (strlen($buffer) >= $mb) {
            $data = substr($buffer, 0, $mb);
            $buffer = substr($buffer, $mb) ?: '';
            $treeHash->addData($data);
        }
        
        if (strlen($buffer)) {
            $treeHash->addData($buffer);
        }

        return $treeHash->getHash();
    }

    private function getChecksumFile($checksumParts)
    {
        $treeHash = TreeHash::fromChecksums($checksumParts);

        return $treeHash->getHash();
    }

    private function initiateMultipartUpload()
    {
        $result = $this->glacierClient->initiateMultipartUpload([
            'accountId' => '-',
            'vaultName' => $this->glacierConfig->vaultId,
            'archiveDescription' => $this->description,
            'partSize' => $this->partSize,
        ]);

        return $result->get('uploadId');
    }

    private function uploadFileToGlacier()
    {
        $success = true;
        $data = false;

        try {
            $fileSize = filesize($this->filePath);

            $uploadId = $this->initiateMultipartUpload();
            $checksums = $this->uploadMultipartFile($uploadId, $fileSize);
            $model = $this->completeMultipartUpload($uploadId, $fileSize, $checksums);

            $data = (object) [
                'archiveId' => $model->get('archiveId'),
                'executionDate' => $this->executionDate,
                'location' => $model->get('location'),
            ];
        } catch (\Exception $e) {
            $this->errorMessage = $e->getMessage();
            $success = false;
        }

        return [$success, $data];
    }
    
    private function uploadMultipartFile($uploadId, $fileSize)
    {
        $numParts = ceil($fileSize / $this->partSize);
        $fp = fopen($this->filePath, 'r');
        $partIdx = 0;
        $checksumParts = [];

        error_log("Uploading: {$this->filePath} ({$fileSize} bytes) in {$numParts} parts...");

        while ($partIdx * $this->partSize < ($fileSize + 1)) {
            $firstByte = $partIdx * $this->partSize;
            $lastByte = min((($partIdx + 1) * $this->partSize) - 1, $fileSize - 1);
            $content = fread($fp, $this->partSize);
            
            // we compute the checksum of the part we're processing. It's important to send it as a parameter to the
            // aws api's GlacierClient::uploadMultipartPart. Aws compute on their side the checksum of the uploaded part. If
            // their checksum doesn't match ours, the api throws an exception.
            $checksumPart = $this->getChecksumPart($content);

            $result = $this->glacierClient->uploadMultipartPart([
                'body' => $content,
                'uploadId' => $uploadId,
                'vaultName' => $this->glacierConfig->vaultId,
                'checksum' => $checksumPart,
                'range' => "bytes {$firstByte}-{$lastByte}/*"
            ]);

            $checksumParts[] = $result->get('checksum'); // same result as $checksumPart. It throws an exception if doesn't
            
            $partIdx++;
            error_log("Part {$partIdx} uploaded...");
        }

        return $checksumParts;
    }
}

$uploadMultipartFileToGlacier = new UploadMultipartFileToGlacier('<FILE_PATH>');

$uploadMultipartFileToGlacier->upload();