是否有更有效的方法从Amazon S3中的存储桶列出文件,并提取每个文件的元数据?我正在使用AWS PHP SDK。
if ($paths = $s3->get_object_list('my-bucket')) {
foreach($paths AS $path) {
$meta = $s3->get_object_metadata('my-bucket', $path);
echo $path . ' was modified on ' . $meta['LastModified'] . '<br />';
}
}
目前,我需要运行get_object_list()列出所有文件,然后get_object_metadata()为每个文件获取其元数据。
如果我的存储桶中有100个文件,则会进行101次调用以获取此数据。如果可以在1次通话中完成它将会很好。
E.g:
if ($paths = $s3->get_object_list('my-bucket')) {
foreach($paths AS $path) {
echo $path['FileName'] . ' was modified on ' . $path['LastModified'] . '<br />';
}
}
答案 0 :(得分:3)
我知道这有点旧,但是我遇到了这个问题并且为了解决这个问题我扩展了Aws sdk以使用批处理功能来解决这类问题。检索大量文件的自定义元数据会更快。 这是我的代码:
/**
* Name: Steves_Amazon_S3
*
* Extends the AmazonS3 class in order to create a function to
* more efficiently retrieve a list of
* files and their custom metadata using the CFBatchRequest function.
*
*
*/
class Steves_Amazon_S3 extends AmazonS3 {
public function get_object_metadata_batch($bucket, $filenames, $opt = null) {
$batch = new CFBatchRequest();
foreach ($filenames as $filename) {
$this->batch($batch)->get_object_headers($bucket, $filename); // Get content-type
}
$response = $this->batch($batch)->send();
// Fail if any requests were unsuccessful
if (!$response->areOK()) {
return false;
}
foreach ($response as $file) {
$temp = array();
$temp['name'] = (string) basename($file->header['_info']['url']);
$temp['etag'] = (string) basename($file->header['etag']);
$temp['size'] = $this->util->size_readable((integer) basename($file->header['content-length']));
$temp['size_raw'] = basename($file->header['content-length']);
$temp['last_modified'] = (string) date("jS M Y H:i:s", strtotime($file->header['last-modified']));
$temp['last_modified_raw'] = strtotime($file->header['last-modified']);
@$temp['creator_id'] = (string) $file->header['x-amz-meta-creator'];
@$temp['client_view'] = (string) $file->header['x-amz-meta-client-view'];
@$temp['user_view'] = (string) $file->header['x-amz-meta-user-view'];
$result[] = $temp;
}
return $result;
}
}
答案 1 :(得分:2)
您需要知道list_objects
函数有限制。即使将max-keys
选项设置为某个较大的数字,它也不允许加载超过1000个对象。
要解决此问题,您需要多次加载数据:
private function _getBucketObjects($prefix = '', $booOneLevelOny = false)
{
$objects = array();
$lastKey = null;
do {
$args = array();
if (isset($lastKey)) {
$args['marker'] = $lastKey;
}
if (strlen($prefix)) {
$args['prefix'] = $prefix;
}
if($booOneLevelOny) {
$args['delimiter'] = '/';
}
$res = $this->_client->list_objects($this->_bucket, $args);
if (!$res->isOK()) {
return null;
}
foreach ($res->body->Contents as $object) {
$objects[] = $object;
$lastKey = (string)$object->Key;
}
$isTruncated = (string)$res->body->IsTruncated;
unset($res);
} while ($isTruncated == 'true');
return $objects;
}
结果 - 您将拥有完整的对象列表。
如果您有一些自定义标题怎么办?
它们不会通过list_objects
函数返回。在这种情况下,这将有所帮助:
foreach (array_chunk($arrObjects, 1000) as $object_set) {
$batch = new CFBatchRequest();
foreach ($object_set as $object) {
if(!$this->isFolder((string)$object->Key)) {
$this->_client->batch($batch)->get_object_headers($this->_bucket, $this->preparePath((string)$object->Key));
}
}
$response = $this->_client->batch($batch)->send();
if ($response->areOK()) {
foreach ($response as $arrHeaderInfo) {
$arrHeaders[] = $arrHeaderInfo->header;
}
}
unset($batch, $response);
}
答案 2 :(得分:0)
我最终使用了list_objects函数,它取出了我需要的LastModified元素。
一次性通话:)