AWS GLUE-转换值列为大写

时间:2019-12-22 00:30:46

标签: pyspark pyspark-sql aws-glue

我正在尝试将列值转换为大写,字段值保持相同的格式。有想法吗?

代码转换数据

df = datasource0.toDF()

df = df.withColumn("Email", F.upper(F.col("Email"))) 

datasource2 = DynamicFrame.fromDF(df, glueContext, "datasource2")

完整代码

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "mydatabase", table_name = "table_email", transformation_ctx = "datasource0")

    #########################################
    ##CUSTOM ETL
    #########################################
df = datasource0.toDF()

df = df.withColumn("Email", F.upper(F.col("Email"))) 

datasource2 = DynamicFrame.fromDF(df, glueContext, "datasource2")




applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("Email","string","email", "string"), transformation_ctx = "applymapping1")

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")

dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")

datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://mysq3/data"}, format = "parquet", transformation_ctx = "datasink4")
job.commit()

0 个答案:

没有答案