如何从AWS S3存储桶读取多个实木复合地板并将其转换为单个熊猫数据框

时间:2020-07-16 23:34:46

标签: python pandas amazon-web-services amazon-s3

我正在尝试从aws s3存储桶中读取多个实木复合地板,并将它们全部转换为一个大熊猫数据帧。我有:

bucket = s3.Bucket(name='mybucket')
objects = []
keys = []

for obj in bucket.objects.all():
    subsrc = obj.Object()
    key = obj.key
    body = obj.get()['Body'].read()
    objects.append(body)
    keys.append(key)

 

但是当我打印对象[0]时,它只是字母“ b”

我也在考虑做类似的事情:

 count = 0   
    for file in bucket.objects.all():
        obj = s3.get_object(Bucket="my-bucket", Key=keys[count])
        obj_df = pd.read_parquet(obj["Body"])
        df_list.append(obj_df)
        count+=1

但这给了我

AttributeError: 's3.ServiceResource' object has no attribute 'get_object'

然后当我注释掉get_object行时,我得到了:

TypeError: Cannot convert bytes to pyarrow.lib.NativeFile

非常感谢您!

0 个答案:

没有答案