我尝试在S3上使用AWS Glue进行分区和存储。但是桶装没用。仅分区起作用。如何使用AWS Glue进行存储?
datasink4 = glueContext.write_dynamic_frame.from_options(
frame = dropnullfields3,
connection_type = "s3",
connection_options = {"path": s3_output_full,
"partitionKeys": ["PARTITIONKEY"],
"bucketColumns": ["ROW_ID"],
"numberOfBuckets": 12},
format = "parquet",
transformation_ctx = "datasink4")
job.commit()
答案 0 :(得分:0)
我认为还不支持他们
我的脚本改用bucketBy函数;但是它将替换定义路径中的现有数据
df_name, job_df = (str(transform_name), df)
datasink_path = "s3://sink-bucket/job-data/"
writing = job_df.write.format('parquet').mode("append") \
.partitionBy('event_day') \
.bucketBy(3, 'bucketed_field') \
.saveAsTable(df_name, path = datasink_path)