spark-将桶装DataFrame保存到S3(镶木地板)中

时间:2019-07-08 08:03:59

标签: apache-spark amazon-s3 parquet

有一种方法可以通过某个键来对数据帧进行存储(以防止混洗)并将其存储到S3中?

这是我的尝试,它不起作用:-(

# write the bucketed data into S3
df.write.bucketBy(10, 'account_id').saveAsTable('test_table', format='parquet', mode='overwrite',path='s3a://path/aaa')

# read the bucketed data from S3
df1 = spark.read.option('mergeSchema', 'false').parquet('s3://path/aaa')
df1.registerTempTable('aaa')

spark.sql('DESC FORMATTED test_table') # result: Bucket Columns [`account_id`]
spark.sql('DESC FORMATTED aaa')  # this DF is not bucketed :-(

谢谢!

0 个答案:

没有答案