如何列出所有公开的AWS S3对象?

时间:2017-02-09 16:35:58

标签: amazon-web-services amazon-s3

我想列出我公开的s3存储桶中的所有对象。使用get-object-acl会列出特定对象的受助者,所以我想知道是否有更好的选项

3 个答案:

答案 0 :(得分:1)

依赖get-object-acl可能不是您想要做的,因为可以通过对象的ACL之外的其他方式将对象公开。至少,这既可以通过对象的ACL,也可以通过存储桶的策略(例如参见https://havecamerawilltravel.com/photographer/how-allow-public-access-amazon-bucket/)来实现,也许还有其他我不知道的方法。

更聪明的测试是在没有凭据的情况下向每个对象发出HEAD请求。如果您得到200,则是公开的。如果您收到403,则不是。

步骤如下:

  1. 获取具有ListBuckets端点的存储桶列表。在CLI中,这是:

    aws2 s3api list-buckets
    
  2. 对于每个存储桶,获取其区域并列出其对象。在CLI中(假设您已配置凭据以使用它),可以分别使用以下两个命令来完成以下两项操作:

    aws2 s3api get-bucket-location --bucket bucketnamehere
    aws2 s3api list-objects --bucket bucketnamehere
  3. 对于每个对象,向URL之类的HEAD请求

    https://bucketname.s3.us-east-1.amazonaws.com/objectname

    bucketname us-east-1 objectname 分别替换为您的存储桶名称,存储桶区域的实际名称,和您的对象名称。

    要使用url从Unix命令行执行此操作,

    curl -I https://bucketname.s3.us-east-1.amazonaws.com/objectname

使用Boto 3和请求在Python中实现上述逻辑的示例:

from typing import Iterator
import boto3
import requests

s3 = boto3.client('s3')
all_buckets = [
    bucket_dict['Name'] for bucket_dict in
    s3.list_buckets()['Buckets']
]

def list_objs(bucket: str) -> Iterator[str]:
    """
    Generator yielding all object names in the bucket. Potentially requires
    multiple requests for large buckets since list_objects is capped at 1000
    objects returned per call.
    """
    response = s3.list_objects_v2(Bucket=bucket)
    while True:
        if 'Contents' not in response:
            # Happens if bucket is empty
            return
        for obj_dict in response['Contents']:
            yield obj_dict['Key']
            last_key = obj_dict['Key']
        if response['IsTruncated']:
            response = s3.list_objects_v2(Bucket=bucket, StartAfter=last_key)
        else:
            return

def is_public(bucket: str, region: str, obj: str) -> bool:
    url = f'https://{bucket}.s3.{region}.amazonaws.com/{obj}'
    resp = requests.head(url)
    if resp.status_code == 200:
        return True
    elif resp.status_code == 403:
        return False
    else:
        raise Exception(f'Unexpected HTTP code {resp.status_code} from {url}')

for bucket in all_buckets:
    region = s3.get_bucket_location(Bucket=bucket)['LocationConstraint']
    for obj in list_objs(bucket):
        if is_public(bucket, region, obj):
            print(f'{bucket}/{obj} is public')

请注意,每个对象大约需要一秒钟,如果您在S3中有很多东西,这是不理想的。不过,我不知道有什么更快的选择。

答案 1 :(得分:0)

将存储桶的名称或存储桶列表放入“buckets.list”文件并运行下面的 bash 脚本。

脚本支持无限(!)数量的对象,因为它使用分页。

#!/bin/bash

MAX_ITEMS=100
PAGE_SIZE=100

for BUCKET in $(cat buckets.list);
do
    OBJECTS=$(aws s3api list-objects-v2 --bucket $BUCKET --max-items=$MAX_ITEMS --page-size=$PAGE_SIZE 2>/dev/null)
    e1=$?

    if [[ "OBJECTS" =~ "Could not connect to the endpoint URL" ]]; then
        echo "Could not connect to the endpoint URL!"
        echo -e "$BUCKET" "$OBJECT" "Could not connect to the endpoint URL" >> errors.log
    fi

    NEXT_TOKEN=$(echo $OBJECTS | jq -r '.NextToken')

    while [[ "$NEXT_TOKEN" != "" ]]
    do
        OBJECTS=$(aws s3api list-objects-v2 --bucket $BUCKET --max-items=$MAX_ITEMS --page-size=$PAGE_SIZE --starting-token $NEXT_TOKEN | jq -r '.Contents | .[].Key' 2>/dev/null)
        for OBJECT in $OBJECTS;
        do
            ACL=$(aws s3api get-object-acl --bucket $BUCKET --key $OBJECT --query "Grants[?Grantee.URI=='http://acs.amazonaws.com/groups/global/AllUsers']" --output=text 2>/dev/null)
            e2=$?
            if [[ "$ACL" =~ "Could not connect to the endpoint URL" ]]; then
                echo "Could not connect to the endpoint URL!"
                echo -e "$BUCKET" "$OBJECT" "Could not connect to the endpoint URL" >> errors.log
            fi

            if [[ ! "$ACL" == ""  ]] && [[ $e1 == 0 ]] && [[ $e2 == 0 ]]; then
                echo -e "$BUCKET" "$OBJECT" "Public object!!!" "$ACL"
                echo -e "$BUCKET" "$OBJECT" "$ACL" >> public-objects.log
            else
                echo -e "$BUCKET" "$OBJECT" "not public"
            fi
        done
    done
done

答案 2 :(得分:-1)

使用AWS CLI花费一些时间后,可以告诉您,最佳方法是使用结构化前缀下的权限同步,mv或cp文件 权限 - 指定授予的权限,可以设置为read,readacl,writeacl或full。

例如aws s3 sync . s3://my-bucket/path --acl public-read

然后在所需的前缀下列出所有这些对象。