我在Amazon S3上更改了一堆我的存储桶的生命周期,因此他们的存储类设置为Glacier。我是使用在线AWS控制台完成的。我现在再次需要这些文件。
我知道如何将它们恢复到每个文件的S3。但我的桶有数千个文件。我想看看是否有办法将整个桶恢复到S3,就像有办法将整个桶送到Glacier一样?
我猜测有一种编程解决方案的方法。但我想知道是否有办法在控制台中执行此操作。还是用另一个程序?或者其他我可能会遗失的东西?
答案 0 :(得分:53)
如果使用s3cmd
,您可以使用它轻松地递归恢复:
s3cmd restore --recursive s3://mybucketname/
我也用它来恢复文件夹:
s3cmd restore --recursive s3://mybucketname/folder/
答案 1 :(得分:30)
如果你正在使用AWS CLI tool(你应该这样做很好),你可以这样做:
aws s3 ls s3://<bucket_name> | awk '{print $4}' | xargs -L 1 aws s3api restore-object --restore-request Days=<days> --bucket <bucket_name> --key
将<bucket_name>
替换为您想要的存储桶名称。
将<days>
替换为您要还原对象的天数。
答案 2 :(得分:13)
上述答案对我来说效果不佳,因为我的水桶与冰川上的物体混合在一起,而其他一些则没有。对我来说最简单的事情是在存储桶中创建所有 GLACIER对象的列表,然后尝试单独还原每个对象,忽略任何错误(如已在进行中,而不是对象等)。
获取存储桶中所有GLACIER文件(密钥)的列表
aws s3api list-objects-v2 --bucket <bucketName> --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print $2}' > glacier-restore.txt
创建一个shell脚本并运行它,替换“bucketName”。
#!/bin/sh
for x in `cat glacier-restore.txt`
do
echo "Begin restoring $x"
aws s3api restore-object --restore-request Days=7 --bucket <bucketName> --key "$x"
echo "Done restoring $x"
done
在http://capnjosh.com/blog/a-client-error-invalidobjectstate-occurred-when-calling-the-copyobject-operation-operation-is-not-valid-for-the-source-objects-storage-class/获得了Josh的信用,这是我在尝试上述一些解决方案后找到的资源。
答案 3 :(得分:12)
没有内置工具。 S3中的“文件夹”是人类方便的幻觉,基于对象键(路径/文件名)中的正斜杠,并且每个迁移到冰川的对象都必须单独恢复,但是...
当然,您可以编写一个脚本来遍历层次结构,并使用您选择的编程语言中的SDK或REST API发送这些恢复请求。
在继续之前,请务必了解从冰川恢复到S3的工作方式。它始终只是一个临时恢复,您可以选择每个对象在S3中保留的天数,然后再恢复为仅存储在冰川中。
此外,您希望确定您了解在短时间内恢复过多冰川数据的罚款,或者您可能会花费一些意外费用。根据紧急程度,您可能希望在几天或几周内将恢复操作分散开来。
答案 4 :(得分:4)
我最近需要恢复整个存储桶及其所有文件和文件夹。您将需要使用您的凭据配置的s3cmd和aws cli工具来运行它。
我发现这非常强大,可以处理存储桶中可能已经有恢复请求的特定对象的错误。
#!/bin/sh
# This will give you a nice list of all objects in the bucket with the bucket name stripped out
s3cmd ls -r s3://<your-bucket-name> | awk '{print $4}' | sed 's#s3://<your-bucket-name>/##' > glacier-restore.txt
for x in `cat glacier-restore.txt`
do
echo "restoring $x"
aws s3api restore-object --restore-request Days=7 --bucket <your-bucket-name> --profile <your-aws-credentials-profile> --key "$x"
done
答案 5 :(得分:4)
以下是我的aws cli
界面版本以及如何从冰川恢复数据。当要还原的文件的密钥包含空格时,我修改了上面的一些示例。
# Parameters
BUCKET="my-bucket" # the bucket you want to restore, no s3:// no slashes
BPATH="path/in/bucket/" # the objects prefix you wish to restore (mind the `/`)
DAYS=1 # For how many days you wish to restore the data.
# Restore the objects
aws s3 ls s3://{BUCKET}/${BPATH} --recursive | \
awk '{out=""; for(i=4;i<=NF;i++){out=out" "$i}; print out}'| \
xargs -I {} aws s3api restore-object --restore-request Days={DAYS} \
--bucket {BUCKET} --key "{}"
答案 6 :(得分:2)
看起来S3 Browser可以在文件夹级别“从Glacier恢复”,但不是桶级别。唯一的问题是你必须购买专业版。所以不是最好的解决方案。
答案 7 :(得分:1)
Dustin对使用AWS CLI的答案的一个变体,但是使用递归和管道来跳过错误(比如某些对象已经请求恢复...)
BUCKET=my-bucket
BPATH=/path/in/bucket
DAYS=1
aws s3 ls s3://$BUCKET$BPATH --recursive | awk '{print $4}' | xargs -L 1 \
echo aws s3api restore-object --restore-request Days=$DAYS \
--bucket $BUCKET --key | sh
xargs echo位生成一个“aws s3api restore-object”命令列表,并通过管道传递给sh,你可以继续出错。
注意:Ubuntu 14.04 aws-cli包很旧。要使用--recursive
,您需要install via github.
POSTSCRIPT:Glacier restores can get unexpectedly pricey really quickly.根据您的使用案例,您可能会发现不常访问层更合适。 AWS have a nice explanation of the different tiers.
答案 8 :(得分:0)
另一种方式是rclone。此工具可以同步/复制/推送数据(就像我们可以处理文件一样)。 https://rclone.org/faq/#can-rclone-sync-directly-from-drive-to-s3(链接示例适用于Google驱动器,但这是agnostique)。但正如迈克尔 - sqlbot所说,服务器或容器必须在某处启动同步/备份操作。
答案 9 :(得分:0)
此命令对我有用:
aws s3api list-objects-v2 \
--bucket BUCKET_NAME \
--query "Contents[?StorageClass=='GLACIER']" \
--output text | \
awk -F $'\t' '{print $2}' | \
tr '\n' '\0' | \
xargs -L 1 -0 \
aws s3api restore-object \
--restore-request Days=7 \
--bucket BUCKET_NAME \
--key
ProTip
RestoreAlreadyInProgress
状态,然后才能重新运行它。状态可能需要几个小时才能过渡。如果您需要等待,则会看到此错误消息:An error occurred (RestoreAlreadyInProgress) when calling the RestoreObject operation
答案 10 :(得分:0)
我今天去过这家工厂,并根据上述答案提出了以下建议,并尝试了s3cmd。 s3cmd不适用于混合水桶(冰川和标准水桶)。这将分两步完成您需要的工作-首先创建冰川文件列表,然后对s3 cli请求执行ping操作(即使它们已经发生)。它还会跟踪已请求的内容,因此您可以根据需要重新启动脚本。在下面引用的cut命令中注意TAB(\ t):
#/bin/sh
bucket="$1"
glacier_file_list="glacier-restore-me-please.txt"
glacier_file_done="glacier-requested-restore-already.txt"
if [ "X${bucket}" = "X" ]
then
echo "Please supply bucket name as first argument"
exit 1
fi
aws s3api list-objects-v2 --bucket ${bucket} --query "Contents[?StorageClass=='GLACIER']" --output text |cut -d '\t' -f 2 > ${glacier_file_list}
if $? -ne 0
then
echo "Failed to fetch list of objects from bucket ${bucket}"
exit 1
fi
echo "Got list of glacier files from bucket ${bucket}"
while read x
do
echo "Begin restoring $x"
aws s3api restore-object --restore-request Days=7 --bucket ${bucket} --key "$x"
if [ $? -ne 0 ]
then
echo "Failed to restore \"$x\""
else
echo "Done requested restore of \"$x\""
fi
# Log those done
#
echo "$x" >> ${glacier_file_done}
done < ${glacier_file_list}
答案 11 :(得分:0)
我用python编写了一个程序来递归恢复文件夹。上面的 s3cmd
命令对我不起作用,awk
命令也没有。
您可以像 python3 /home/ec2-user/recursive_restore.py -- restore
一样运行它,并使用 python3 /home/ec2-user/recursive_restore.py -- status
import argparse
import base64
import json
import os
import sys
from datetime import datetime
from pathlib import Path
import boto3
import pymysql.cursors
import yaml
from botocore.exceptions import ClientError
__author__ = "kyle.bridenstine"
def reportStatuses(
operation,
type,
successOperation,
folders,
restoreFinished,
restoreInProgress,
restoreNotRequestedYet,
restoreStatusUnknown,
skippedFolders,
):
"""
reportStatuses gives a generic, aggregated report for all operations (Restore, Status, Download)
"""
report = 'Status Report For "{}" Operation. Of the {} total {}, {} are finished being {}, {} have a restore in progress, {} have not been requested to be restored yet, {} reported an unknown restore status, and {} were asked to be skipped.'.format(
operation,
str(len(folders)),
type,
str(len(restoreFinished)),
successOperation,
str(len(restoreInProgress)),
str(len(restoreNotRequestedYet)),
str(len(restoreStatusUnknown)),
str(len(skippedFolders)),
)
if (len(folders) - len(skippedFolders)) == len(restoreFinished):
print(report)
print("Success: All {} operations are complete".format(operation))
else:
if (len(folders) - len(skippedFolders)) == len(restoreNotRequestedYet):
print(report)
print("Attention: No {} operations have been requested".format(operation))
else:
print(report)
print("Attention: Not all {} operations are complete yet".format(operation))
def status(foldersToRestore, restoreTTL):
s3 = boto3.resource("s3")
folders = []
skippedFolders = []
# Read the list of folders to process
with open(foldersToRestore, "r") as f:
for rawS3Path in f.read().splitlines():
folders.append(rawS3Path)
s3Bucket = "put-your-bucket-name-here"
maxKeys = 1000
# Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)
# Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
client = boto3.client("s3")
paginator = client.get_paginator("list_objects")
operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
page_iterator = paginator.paginate(**operation_parameters)
pageCount = 0
totalS3ObjectKeys = []
totalS3ObjKeysRestoreFinished = []
totalS3ObjKeysRestoreInProgress = []
totalS3ObjKeysRestoreNotRequestedYet = []
totalS3ObjKeysRestoreStatusUnknown = []
# Iterate through the pages of S3 Object Keys
for page in page_iterator:
for s3Content in page["Contents"]:
s3ObjectKey = s3Content["Key"]
# Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
if s3ObjectKey.endswith("/"):
continue
totalS3ObjectKeys.append(s3ObjectKey)
s3Object = s3.Object(s3Bucket, s3ObjectKey)
if s3Object.restore is None:
totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
elif "true" in s3Object.restore:
totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
elif "false" in s3Object.restore:
totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
else:
totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
pageCount = pageCount + 1
# Report the total statuses for the folders
reportStatuses(
"restore folder " + rawS3Path,
"files",
"restored",
totalS3ObjectKeys,
totalS3ObjKeysRestoreFinished,
totalS3ObjKeysRestoreInProgress,
totalS3ObjKeysRestoreNotRequestedYet,
totalS3ObjKeysRestoreStatusUnknown,
[],
)
def removeS3BucketPrefixFromPath(path, bucket):
"""
removeS3BucketPrefixFromPath removes "s3a://<bucket name>" or "s3://<bucket name>" from the Path
"""
s3BucketPrefix1 = "s3a://" + bucket + "/"
s3BucketPrefix2 = "s3://" + bucket + "/"
if path.startswith(s3BucketPrefix1):
# remove one instance of prefix
return path.replace(s3BucketPrefix1, "", 1)
elif path.startswith(s3BucketPrefix2):
# remove one instance of prefix
return path.replace(s3BucketPrefix2, "", 1)
else:
return path
def restore(foldersToRestore, restoreTTL):
"""
restore initiates a restore request on one or more folders
"""
print("Restore Operation")
s3 = boto3.resource("s3")
bucket = s3.Bucket("put-your-bucket-name-here")
folders = []
skippedFolders = []
# Read the list of folders to process
with open(foldersToRestore, "r") as f:
for rawS3Path in f.read().splitlines():
folders.append(rawS3Path)
# Skip folders that are commented out of the file
if "#" in rawS3Path:
print("Skipping this folder {} since it's commented out with #".format(rawS3Path))
folders.append(rawS3Path)
continue
else:
print("Restoring folder {}".format(rawS3Path))
s3Bucket = "put-your-bucket-name-here"
maxKeys = 1000
# Remove the S3 Bucket Prefix to get just the S3 Path i.e., the S3 Objects prefix and key name
s3Path = removeS3BucketPrefixFromPath(rawS3Path, s3Bucket)
print("s3Bucket={}, s3Path={}, maxKeys={}".format(s3Bucket, s3Path, maxKeys))
# Construct an S3 Paginator that returns pages of S3 Object Keys with the defined prefix
client = boto3.client("s3")
paginator = client.get_paginator("list_objects")
operation_parameters = {"Bucket": s3Bucket, "Prefix": s3Path, "MaxKeys": maxKeys}
page_iterator = paginator.paginate(**operation_parameters)
pageCount = 0
totalS3ObjectKeys = []
totalS3ObjKeysRestoreFinished = []
totalS3ObjKeysRestoreInProgress = []
totalS3ObjKeysRestoreNotRequestedYet = []
totalS3ObjKeysRestoreStatusUnknown = []
# Iterate through the pages of S3 Object Keys
for page in page_iterator:
print("Processing S3 Key Page {}".format(str(pageCount)))
s3ObjectKeys = []
s3ObjKeysRestoreFinished = []
s3ObjKeysRestoreInProgress = []
s3ObjKeysRestoreNotRequestedYet = []
s3ObjKeysRestoreStatusUnknown = []
for s3Content in page["Contents"]:
print("Processing S3 Object Key {}".format(s3Content["Key"]))
s3ObjectKey = s3Content["Key"]
# Folders show up as Keys but they cannot be restored or downloaded so we just ignore them
if s3ObjectKey.endswith("/"):
print("Skipping this S3 Object Key because it's a folder {}".format(s3ObjectKey))
continue
s3ObjectKeys.append(s3ObjectKey)
totalS3ObjectKeys.append(s3ObjectKey)
s3Object = s3.Object(s3Bucket, s3ObjectKey)
print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))
# Ensure this folder was not already processed for a restore
if s3Object.restore is None:
restore_response = bucket.meta.client.restore_object(
Bucket=s3Object.bucket_name, Key=s3Object.key, RestoreRequest={"Days": restoreTTL}
)
print("Restore Response: {}".format(str(restore_response)))
# Refresh object and check that the restore request was successfully processed
s3Object = s3.Object(s3Bucket, s3ObjectKey)
print("{} - {} - {}".format(s3Object.key, s3Object.storage_class, s3Object.restore))
if s3Object.restore is None:
s3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
totalS3ObjKeysRestoreNotRequestedYet.append(s3ObjectKey)
print("%s restore request failed" % s3Object.key)
# Instead of failing the entire job continue restoring the rest of the log tree(s)
# raise Exception("%s restore request failed" % s3Object.key)
elif "true" in s3Object.restore:
print(
"The request to restore this file has been successfully received and is being processed: {}".format(
s3Object.key
)
)
s3ObjKeysRestoreInProgress.append(s3ObjectKey)
totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
elif "false" in s3Object.restore:
print("This file has successfully been restored: {}".format(s3Object.key))
s3ObjKeysRestoreFinished.append(s3ObjectKey)
totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
else:
print(
"Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
)
s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
elif "true" in s3Object.restore:
print("Restore request already received for {}".format(s3Object.key))
s3ObjKeysRestoreInProgress.append(s3ObjectKey)
totalS3ObjKeysRestoreInProgress.append(s3ObjectKey)
elif "false" in s3Object.restore:
print("This file has successfully been restored: {}".format(s3Object.key))
s3ObjKeysRestoreFinished.append(s3ObjectKey)
totalS3ObjKeysRestoreFinished.append(s3ObjectKey)
else:
print(
"Unknown restore status ({}) for file: {}".format(s3Object.restore, s3Object.key)
)
s3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
totalS3ObjKeysRestoreStatusUnknown.append(s3ObjectKey)
# Report the statuses per S3 Key Page
reportStatuses(
"folder-" + rawS3Path + "-page-" + str(pageCount),
"files in this page",
"restored",
s3ObjectKeys,
s3ObjKeysRestoreFinished,
s3ObjKeysRestoreInProgress,
s3ObjKeysRestoreNotRequestedYet,
s3ObjKeysRestoreStatusUnknown,
[],
)
pageCount = pageCount + 1
if pageCount > 1:
# Report the total statuses for the files
reportStatuses(
"restore-folder-" + rawS3Path,
"files",
"restored",
totalS3ObjectKeys,
totalS3ObjKeysRestoreFinished,
totalS3ObjKeysRestoreInProgress,
totalS3ObjKeysRestoreNotRequestedYet,
totalS3ObjKeysRestoreStatusUnknown,
[],
)
def displayError(operation, exc):
"""
displayError displays a generic error message for all failed operation's returned exceptions
"""
print(
'Error! Restore{} failed. Please ensure that you ran the following command "./tools/infra auth refresh" before executing this program. Error: {}'.format(
operation, exc
)
)
def main(operation, foldersToRestore, restoreTTL):
"""
main The starting point of the code that directs the operation to it's appropriate workflow
"""
print(
"{} Starting log_migration_restore.py with operation={} foldersToRestore={} restoreTTL={} Day(s)".format(
str(datetime.now().strftime("%d/%m/%Y %H:%M:%S")), operation, foldersToRestore, str(restoreTTL)
)
)
if operation == "restore":
try:
restore(foldersToRestore, restoreTTL)
except Exception as exc:
displayError("", exc)
elif operation == "status":
try:
status(foldersToRestore, restoreTTL)
except Exception as exc:
displayError("-Status-Check", exc)
else:
raise Exception("%s is an invalid operation. Please choose either 'restore' or 'status'" % operation)
def check_operation(operation):
"""
check_operation validates the runtime input arguments
"""
if operation is None or (
str(operation) != "restore" and str(operation) != "status" and str(operation) != "download"
):
raise argparse.ArgumentTypeError(
"%s is an invalid operation. Please choose either 'restore' or 'status' or 'download'" % operation
)
return str(operation)
# To run use sudo python3 /home/ec2-user/recursive_restore.py -- restore
# -l /home/ec2-user/folders_to_restore.csv
if __name__ == "__main__":
# Form the argument parser.
parser = argparse.ArgumentParser(
description="Restore s3 folders from archival using 'restore' or check on the restore status using 'status'"
)
parser.add_argument(
"operation",
type=check_operation,
help="Please choose either 'restore' to restore the list of s3 folders or 'status' to see the status of a restore on the list of s3 folders",
)
parser.add_argument(
"-l",
"--foldersToRestore",
type=str,
default="/home/ec2-user/folders_to_restore.csv",
required=False,
help="The location of the file containing the list of folders to restore. Put one folder on each line.",
)
parser.add_argument(
"-t",
"--restoreTTL",
type=int,
default=30,
required=False,
help="The number of days you want the filess to remain restored/unarchived. After this period the logs will automatically be rearchived.",
)
args = parser.parse_args()
sys.exit(main(args.operation, args.foldersToRestore, args.restoreTTL))