我尝试使用boto,但是它具有.list()
方法和.get_all_keys()
方法,该方法对我的数据集很长,而Sub ImportCSVsWithReference()
Dim xSht As Worksheet
Dim xWb As Workbook
Dim xStrPath As String
Dim xFileDialog As FileDialog
Dim xFile As String
On Error GoTo ErrHandler
Set xFileDialog = Application.FileDialog(msoFileDialogFilePicker)
xFileDialog.AllowMultiSelect = True
xFileDialog.Title = "Select a folder [CSV Consolidation]"
If xFileDialog.Show = -1 Then
xStrPath = xFileDialog.SelectedItems(1)
End If
If xStrPath = "" Then Exit Sub
Set xSht = ThisWorkbook.ActiveSheet
If MsgBox("Clear the existing sheet before importing?", vbYesNo) = vbYes Then xSht.UsedRange.Clear
Application.ScreenUpdating = False
xFile = Dir(xStrPath & "\" & "*.csv")
Do While xFile <> ""
Set xWb = Workbooks.Open(xStrPath & "\" & xFile)
Columns(1).Insert xlShiftToRight
Columns(1).SpecialCells(xlBlanks).Value = ActiveSheet.Name
ActiveSheet.UsedRange.Copy xSht.Range("A" & Rows.Count).End(xlUp).Offset(1)
xWb.Close False
xFile = Dir
Loop
Application.ScreenUpdating = True
Range("A1:R1").Select
Selection.AutoFilter
Range("L1").AutoFilter Field:=12, Criteria1:="<>"
Selection.End(xlToLeft).Select
Range("A1").CurrentRegion.Select
Selection.Copy
Sheets.Add
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Exit Sub
ErrHandler:
MsgBox "no files csv", , "Team"
End Sub
则使它随机化。我想在我的S3存储桶中获得100-1000个最新密钥,其中有数百万个密钥。最有效的方法是什么。
答案 0 :(得分:0)
import boto3
client = boto3.client('s3')
start_after = “”
response = client.list_objects(Bucket='<bucket>', StartAfter =start_after,MaxKeys=1000)
您保存具有LastModified键的响应['Contents']。
'Contents': [
{
'Key': 'string',
'LastModified': datetime(2015, 1, 1),
'ETag': 'string',
'Size': 123,
'StorageClass': 'STANDARD'|'REDUCED_REDUNDANCY'|'GLACIER'|'STANDARD_IA'|'ONEZONE_IA',
'Owner': {
'DisplayName': 'string',
'ID': 'string'
}
},
],
从这1000条记录中获取最后一个键,并将其分配给 start_after 变量,这次再次发出请求。
新请求开始获取密钥startAfter提供之后的密钥。
https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2
答案 1 :(得分:0)
如果您不介意数据有些过时,可以使用Amazon S3 Inventory,它可以提供每日CSV文件,列出Amazon S3存储桶中的所有对象:
Amazon S3库存提供逗号分隔值(CSV)或Apache优化的行列式(ORC)输出文件,这些文件每天或每周为S3存储桶或共享前缀(即,名称以公用字符串开头的对象。
您可以解析此文件以获得密钥和上次修改日期,然后按日期排序。