我想从aws桶中读取几个实木复合地板文件,并使用pyspark将它们全部转换为一个熊猫数据框。
bucket = s3.Bucket(name='mybucket')
keys =[]
for obj in bucket.objects.all():
subsrc = obj.Object()
print(obj.key)
keys.append(obj.key)
objects = []
for obj in bucket.objects.all():
key = obj.key
body = obj.get()['Body'].read()
objects.append(obj)
count = 0
for file in objects:
obj = s3.get_object(Bucket="my-bucket", Key=keys[count])
obj_df = pd.read_parquet(obj["Body"])
df_list.append(obj_df)
count+=1
df = pd.concat(df_list)
但是我得到了:
AttributeError: 's3.ServiceResource' object has no attribute 'get_object'
我也不确定镶木地板需要如何正确读取。谢谢!