如何在PySpark映射流中突变DynamicFrame子数组

时间:2019-07-11 13:29:02

标签: pyspark aws-glue

我有一个DynamicFrame结构,其中最复杂的方面是指向types的{​​{1}}的{​​{1}}键。由于某种原因,我无法在每条记录上array并对structs键进行突变。在AWS Glue和PySpark中应该如何进行这种类型的操作?

当我编写一个Map函数以突变types并将我的Map写成JSON时,即使我将其绑定,输出也始终具有types键作为空数组做一个新的键,例如DynamicFrame(见下文)。

单个记录示例:

types
blah_types

我至少希望我的输出记录具有键{ "id": "1", "name": "rickroll", "types": [ {"type": "basic", "id": "2"}, {"type": "advanced", "id": "3"} ] } ,其中包含datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "example", table_name = "example", transformation_ctx = "datasource0") def MapRecord(rec): new_types = [t["id"] for t in rec["types"]] rec["types"] = new_types # or even rec["blah_types"] = new_types return rec df_with_flat_types = Map.apply(frame = datasource0, f = MapRecord) applymapping1 = ApplyMapping.apply(frame = df_with_flat_types, mappings = [("id", "string", "id", "string"),("types", "array", "types", "array") ], transformation_ctx = "applymapping1") datasink2 = glueContext.write_dynamic_frame.from_options(frame = applymapping1, connection_type = "s3", connection_options = {"path": "s3://my_bucket/output"}, format = "json", transformation_ctx = "datasink2") job.commit() 最初具有的所有子结构。

0 个答案:

没有答案