如何使用pyspark从jupyter中的大熊猫的aws桶中读取实木复合地板

时间:2020-07-16 23:08:46

标签: python pandas amazon-s3

我想从aws桶中读取几个实木复合地板文件,并使用pyspark将它们全部转换为一个熊猫数据框。

bucket = s3.Bucket(name='mybucket')
keys =[]
for obj in bucket.objects.all():
    subsrc = obj.Object()
    print(obj.key)
    keys.append(obj.key)

objects = []
for obj in bucket.objects.all():
    key = obj.key
    body = obj.get()['Body'].read()
    objects.append(obj)
 

count = 0   
for file in objects:
    obj = s3.get_object(Bucket="my-bucket", Key=keys[count])
    obj_df = pd.read_parquet(obj["Body"])
    df_list.append(obj_df)
    count+=1

df = pd.concat(df_list)

但是我得到了:

AttributeError: 's3.ServiceResource' object has no attribute 'get_object'

我也不确定镶木地板需要如何正确读取。谢谢!

0 个答案:

没有答案