Python:获取Amazon S3存储桶中的前100个最新密钥

时间:2018-08-13 18:25:53

标签: python django amazon-web-services amazon-s3 boto

我尝试使用boto,但是它具有.list()方法和.get_all_keys()方法,该方法对我的数据集很长,而Sub ImportCSVsWithReference() Dim xSht As Worksheet Dim xWb As Workbook Dim xStrPath As String Dim xFileDialog As FileDialog Dim xFile As String On Error GoTo ErrHandler Set xFileDialog = Application.FileDialog(msoFileDialogFilePicker) xFileDialog.AllowMultiSelect = True xFileDialog.Title = "Select a folder [CSV Consolidation]" If xFileDialog.Show = -1 Then xStrPath = xFileDialog.SelectedItems(1) End If If xStrPath = "" Then Exit Sub Set xSht = ThisWorkbook.ActiveSheet If MsgBox("Clear the existing sheet before importing?", vbYesNo) = vbYes Then xSht.UsedRange.Clear Application.ScreenUpdating = False xFile = Dir(xStrPath & "\" & "*.csv") Do While xFile <> "" Set xWb = Workbooks.Open(xStrPath & "\" & xFile) Columns(1).Insert xlShiftToRight Columns(1).SpecialCells(xlBlanks).Value = ActiveSheet.Name ActiveSheet.UsedRange.Copy xSht.Range("A" & Rows.Count).End(xlUp).Offset(1) xWb.Close False xFile = Dir Loop Application.ScreenUpdating = True Range("A1:R1").Select Selection.AutoFilter Range("L1").AutoFilter Field:=12, Criteria1:="<>" Selection.End(xlToLeft).Select Range("A1").CurrentRegion.Select Selection.Copy Sheets.Add Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _ :=False, Transpose:=False Exit Sub ErrHandler: MsgBox "no files csv", , "Team" End Sub 则使它随机化。我想在我的S3存储桶中获得100-1000个最新密钥,其中有数百万个密钥。最有效的方法是什么。

2 个答案:

答案 0 :(得分:0)

import boto3

client = boto3.client('s3')

start_after = “”

response =  client.list_objects(Bucket='<bucket>', StartAfter =start_after,MaxKeys=1000)

您保存具有LastModified键的响应['Contents']。

'Contents': [
    {
        'Key': 'string',
        'LastModified': datetime(2015, 1, 1),
        'ETag': 'string',
        'Size': 123,
        'StorageClass': 'STANDARD'|'REDUCED_REDUNDANCY'|'GLACIER'|'STANDARD_IA'|'ONEZONE_IA',
        'Owner': {
            'DisplayName': 'string',
            'ID': 'string'
        }
    },
],

从这1000条记录中获取最后一个键,并将其分配给 start_after 变量,这次再次发出请求。

新请求开始获取密钥startAfter提供之后的密钥。

https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2

答案 1 :(得分:0)

如果您不介意数据有些过时,可以使用Amazon S3 Inventory,它可以提供每日CSV文件,列出Amazon S3存储桶中的所有对象:

  

Amazon S3库存提供逗号分隔值(CSV)或Apache优化的行列式(ORC)输出文件,这些文件每天或每周为S3存储桶或共享前缀(即,名称以公用字符串开头的对象。

您可以解析此文件以获得密钥和上次修改日期,然后按日期排序。