生成分区名称为Parquet的文件

时间:2019-05-09 08:19:31

标签: python-3.x amazon-s3 aws-glue

    datasink = glueContext.write_dynamic_frame.from_options(
        frame = f_repartition,
        connection_type = "s3",
        connection_options = {
            "path": "s3://"+ coreBucket +"/xxxx/input",
            "partitionKeys": ['api']
        },
        format = "parquet",
        transformation_ctx = "datasink")

我想每个API编号生成一个文件,但是在n个文件下的n个生成api文件夹属于n个相同的api名称,分别为api = 0504500037 / part-00000-ac3e86e7-ac8b-4c0c-9d51-d8772e90abdc.c000.snappy。这样的实木复合地板,但我想生成API = 0504500037.snappy.parquet

0 个答案:

没有答案