如何使用ruby aws-sdk仅在存储桶之间复制丢失的对象?

时间:2013-10-01 01:20:16

标签: ruby amazon-s3

我编写了一个脚本来将s3对象从我的生产s3存储桶复制到我的开发版,但是运行需要很长时间,因为我在复制前单独检查每个对象是否存在。有没有办法区分两个桶,只复制我需要的对象?或者整个桶复制?

以下是我目前的情况:

count = 0
puts "COPYING FROM #{prod_bucket} to #{dev_bucket}"
bm = Benchmark.measure do 
  AWS::S3.new.buckets[prod_bucket].objects.each do |o|
    exists = AWS::S3.new.buckets[dev_bucket].objects[o.key].exists?

    if exists
      puts "Skipping: #{o.key}"
    else
      puts "Copy: #{o.key} (#{count})"
      o.copy_to(o.key, :bucket_name => dev_bucket, :acl => :public_read)
      count += 1
    end
  end
end
puts "Copied #{count} objects in #{bm.real}s"

1 个答案:

答案 0 :(得分:2)

我从来没有使用过那个gem,但你的代码看起来好像可以接收一个存储在桶中的所有项目的数组。加载两个存储桶的列表,并使用简单的数组操作确定丢失的文件。应该快得多。

# load file lists (looks up objects in batches of 1000)
source_files  = AWS::S3.new.buckets[prod_bucket].objects.map(&:key)
target_files  = AWS::S3.new.buckets[dev_bucket].objects.map(&:key)

# determine files missing in dev
files_to_copy = source_files - target_files
files_to_copy.each_with_index do |file_name, i|
  puts "Coping #{i}/#{files_to_copy.size}: #{file_name}"

  S3Object.store(file_name, 
                 S3Object.value(file_name, PROD_BUCKET_NAME), 
                 DEV_BUCKET_NAME)
end

# determine files on dev that are not existing on prod
files_to_remove = target_files - source_files
files_to_remove.each_with_index do |file_name, i|
  puts "Removing #{i}/#{files_to_remove.size}: #{file_name}"

  S3Object.delete(file_name, DEV_BUCKET_NAME)
end