Question

我已经在AWS S3上有一个Cassandra数据库备份。备份正在每天创建并保存在S3上。现在我正在寻找第二个云存储，我可以定期从AWS S3保存我的C *备份副本。基本上，它只是从AWS S3复制文件（大小约500 GB）并将其作为第二次备份定期保存在云中的某个位置。我正在寻找实现这一目标的最佳选择。在成本效益，灵活性和开发人员友好性方面的最佳选择。我需要能够编写一个脚本，从AWS S3复制最新的C *备份，并将其保存到第二个云存储中。需要使用cron作业或rake任务定期运行此脚本。经过一些研究，我找到了Rackspace和新来的Google Compute Engine。但我不确定使用哪一个以及如何使用。我在这方面寻求一些建议。提前谢谢！

EDIT_1 ：

好的，所以我尝试了这个命令：

gsutil -m rsync -r s3://<s3_bucket_name>  gs://<GS_bucket_name>

我已经修改了配置文件.boto并在那里提供了我的aws访问权限和密钥。

但是当我运行上面的命令时，我收到以下消息，其中包含一个例外：

Building synchronization state...
You have requested multiple threads or processes for an operation, but
the required functionality of Python's multiprocessing module is not
available. Your operations will be performed sequentially, and any
requests for parallelism will be ignored. Your max number of open
files, 0, is too low to allow safe multiprocessing. On Linux you can
fix this by adding something like "ulimit -n 10000" to your ~/.bashrc
or equivalent file, and opening a new terminal. On MacOS you can fix
this by running a command like this once: "launchctl limit maxfiles
10000"
ServiceException: Non-MD5 etag ("3fd6e94275941cf4d33768682cd52363-21") present for key <Key: <my_s3_bucket name>,2014-02-18-05-00/disaster-cassandra-1.1/<s3_project_name>/column_attributes/snapshots/1392699667769/<s3_project_name>-column_attributes-ic-1225-Data.db>, data integrity checks are not possible.
Starting synchronization

我在这里缺少什么？有什么想法吗？

Answer 1

您是否在S3中有特定的存储桶需要定期同步到Google云存储存储桶？那不是太难。 Google Cloud Storage的gsutil命令行实用程序具有rsync方法，可同步两个存储桶的内容。您可以使用此命令同步所有内容：

gsutil rsync -d -r s3://original-bucket gs://google-cloud-bucket

设置gsutil，将该行粘贴到cron脚本中，然后就完成了。请记住，“ - d”表示如果在S3中删除了内容，它将删除GCS中的内容，如果您尝试防止意外删除，则可能不需要这样做。

（免责声明：我有一个相当强烈的利益冲突和偏向于支持Google云存储，并且不能指望提供关于哪种云存储解决方案优越的客观建议。

有关安装gsutil的说明：https://developers.google.com/storage/docs/gsutil_install

将从AWS S3获取的Cassandra备份保存到非AWS位置有哪些好的选择？

1 个答案: