使用python删除S3中对象的所有版本?

时间:2017-10-18 21:32:09

标签: python amazon-web-services amazon-s3 boto3

我有一个版本化的存储桶,并希望从存储桶中删除该对象(及其所有版本)。但是,当我尝试从控制台删除对象时,S3只是添加删除标记但不执行硬删除。

是否可以使用特定键删除对象的所有版本(硬删除)?:

s3resource = boto3.resource('s3')
bucket = s3resource.Bucket('my_bucket')
obj = bucket.Object('my_object_key')

# I would like to delete all versions for the object like so:
obj.delete_all_versions()

# or delete all versions for all objects like so:
bucket.objects.delete_all_versions()

11 个答案:

答案 0 :(得分:2)

documentation在这里很有用:

  1. 在S3存储桶中启用版本控制时,简单的DeleteObject请求无法永久删除该存储桶中的对象。相反,Amazon S3会插入删除标记(实际上是具有自己版本ID的对象的新版本)。
  2. 当您尝试获取当前版本为删除标记的对象时,S3的行为就像删除了对象(即使它没有删除)并返回404错误。
  3. 要从版本化存储桶中永久删除对象,请对对象的每个版本(包括delete markers)使用具有相关版本ID的DeleteObject。

答案 1 :(得分:2)

我无法使用其他解决方案来解决这个问题,所以这是我的。

import boto3
bucket = "bucket name goes here"
filename = "filename goes here"

client = boto3.client('s3')
paginator = client.get_paginator('list_object_versions')
response_iterator = paginator.paginate(Bucket=bucket)
for response in response_iterator:
    versions = response.get('Versions', [])
    versions.extend(response.get('DeleteMarkers', []))
    for version_id in [x['VersionId'] for x in versions
                       if x['Key'] == filename and x['VersionId'] != 'null']:
        print('Deleting {} version {}'.format(filename, version_id))
        client.delete_object(Bucket=bucket, Key=filename, VersionId=version_id)

这段代码处理的情况是

  • 实际上未打开对象版本控制
  • DeleteMarker s
  • 没有DeleteMarkers
  • 给定文件的版本超出了单个API响应的范围

Mahesh Mogal's answer不会删除DeleteMarker。如果对象缺少DeleteMarker,则Mangohero1's answer失败。 Hari's answer重复10次(以解决缺少分页逻辑的问题。)

答案 2 :(得分:2)

其他答案分别删除对象。使用 delete_objects boto3调用并批量处理您的删除操作效率更高。请参阅下面的代码,获取一个函数,该函数可以收集所有对象并批量删除1000:

bucket = 'bucket-name'
s3_client = boto3.client('s3')
object_response_paginator = s3_client.get_paginator('list_object_versions')

delete_marker_list = []
version_list = []

for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
    if 'DeleteMarkers' in object_response_itr:
        for delete_marker in object_response_itr['DeleteMarkers']:
            delete_marker_list.append({'Key': delete_marker['Key'], 'VersionId': delete_marker['VersionId']})

    if 'Versions' in object_response_itr:
        for version in object_response_itr['Versions']:
            version_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})

for i in range(0, len(delete_marker_list), 1000):
    response = s3_client.delete_objects(
        Bucket=bucket,
        Delete={
            'Objects': delete_marker_list[i:i+1000],
            'Quiet': True
        }
    )
    print(response)

for i in range(0, len(version_list), 1000):
    response = s3_client.delete_objects(
        Bucket=bucket,
        Delete={
            'Objects': version_list[i:i+1000],
            'Quiet': True
        }
    )
    print(response)

答案 3 :(得分:2)

更少的线路解决方案。

import boto3

def delete_versions(bucket, objects=None): # `objects` is either list of str or None
  bucket = boto3.resource('s3').Bucket(bucket)
  if objects: # delete specified objects
    [version.delete() for version in bucket.object_versions.all() if version.object_key in objects]
  else: # or delete all objects in `bucket`
    [version.delete() for version in bucket.object_versions.all()]

答案 4 :(得分:1)

作为@jarmod答案的补充,这是我开发一种解决方法的方法,以便"难以删除"一个对象(包括删除市场对象);

def get_all_versions(bucket, filename):
    s3 = boto3.client('s3')
    keys = ["Versions", "DeleteMarkers"]
    results = []
    for k in keys:
        response = s3.list_object_versions(Bucket=bucket)[k]
        to_delete = [r["VersionId"] for r in response if r["Key"] == filename]
    results.extend(to_delete)
    return results

bucket = "YOUR BUCKET NAME"
file = "YOUR FILE"

for version in get_all_versions(bucket, file):
    s3.delete_object(Bucket=bucket, Key=file, VersionId=version)

答案 5 :(得分:1)

如果没有这篇文章,这篇文章超级有帮助,我们将花费大量时间清理S3文件夹。

我们只需要清理特定的文件夹。所以我尝试了下面的代码,它像一个魅力。另请注意,我要遍历10次以删除该函数具有的1000个以上的对象限制。随意修改限制。

import boto3
session = boto3.Session(aws_access_key_id='<YOUR ACCESS KEY>',aws_secret_access_key='<YOUR SECRET KEY>')

bucket_name = '<BUCKET NAME>'
object_name = '<KEY NAME>'

s3 = session.client('s3')

for i in range(10):
   versions = s3.list_object_versions (Bucket = bucket_name, Prefix = object_name)
#print (versions)
   version_list = versions.get('Versions')
   for version in version_list:
      keyName = version.get('Key')
      versionId = version.get('VersionId')
      print (keyName + ':' + versionId)
      s3.delete_object(Bucket = bucket_name, Key= keyName, VersionId = versionId)
   marker_list = versions.get('DeleteMarkers')
#print(marker_list)
   for marker in marker_list:
      keyName1 = marker.get('Key')
      versionId1 = marker.get('VersionId')
      print (keyName1 + ':' + versionId1)
      s3.delete_object(Bucket = bucket_name, Key= keyName1, VersionId = versionId1)

答案 6 :(得分:1)

最简单的方法:

import boto3
bucket = boto3.resource("s3").Bucket("mybucket")
bucket.object_versions.all().delete()

答案 7 :(得分:1)

要删除一个或多个对象的所有版本的前缀:

将对象键 /folder/filename 或前缀 /folder/subfolder/ 传递给 Prefix

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket("my-bucket-name")
bucket.object_versions.filter(Prefix="folder/subfolder/").delete()

答案 8 :(得分:0)

您可以使用以下代码

删除包含其所有版本的对象
session = boto3.Session(aws_access_key_id, aws_secret_access_key)

bucket_name = 'bucket_name'
object_name = 'object_name'

s3 = session.client('s3')

versions = s3.list_object_versions (Bucket = bucket_name, Prefix = object_name)
version_list = versions.get('Versions')
for version in version_list:
    versionId = version.get('VersionId')
    s3.delete_object(Bucket = bucket_name, Key= object_name, VersionId = versionId)

答案 9 :(得分:0)

此脚本将删除所有带前缀的对象的所有版本-

s3 = boto3.resource("s3")
client = boto3.client("s3")
s3_bucket = s3.Bucket(bucket_name)
for obj in s3_bucket.objects.filter(Prefix=""):

    response = client.list_object_versions(Bucket=bucket_name, Prefix=obj.key)

    while "Versions" in response:
        to_delete = [
            {"Key": ver["Key"], "VersionId": ver["VersionId"]}
            for ver in response["Versions"]
        ]

        delete = {"Objects": to_delete}

        client.delete_objects(Bucket=bucket_name, Delete=delete)
        response = client.list_object_versions(Bucket=bucket_name, Prefix=obj.key)

    client.delete_object(Bucket=bucket_name, Key=obj.key)

答案 10 :(得分:0)

您可以使用object_versions。

def delete_all_versions(bucket_name: str, prefix: str):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    if prefix is None:
        bucket.object_versions.delete()
    else:
        bucket.object_versions.filter(Prefix=prefix).delete()

delete_all_versions("my_bucket", None) # empties the entire bucket
delete_all_versions("my_bucket", "my_prefix/") # deletes all objects matching the prefix (can be only one if only one matches)