我有一个DynamicFrame
结构,其中最复杂的方面是指向types
的{{1}}的{{1}}键。由于某种原因,我无法在每条记录上array
并对structs
键进行突变。在AWS Glue和PySpark中应该如何进行这种类型的操作?
当我编写一个Map
函数以突变types
并将我的Map
写成JSON时,即使我将其绑定,输出也始终具有types
键作为空数组做一个新的键,例如DynamicFrame
(见下文)。
单个记录示例:
types
blah_types
我至少希望我的输出记录具有键{
"id": "1",
"name": "rickroll",
"types": [
{"type": "basic", "id": "2"},
{"type": "advanced", "id": "3"}
]
}
,其中包含datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "example", table_name = "example", transformation_ctx = "datasource0")
def MapRecord(rec):
new_types = [t["id"] for t in rec["types"]]
rec["types"] = new_types
# or even
rec["blah_types"] = new_types
return rec
df_with_flat_types = Map.apply(frame = datasource0, f = MapRecord)
applymapping1 = ApplyMapping.apply(frame = df_with_flat_types, mappings = [("id", "string", "id", "string"),("types", "array", "types", "array") ], transformation_ctx = "applymapping1")
datasink2 = glueContext.write_dynamic_frame.from_options(frame = applymapping1, connection_type = "s3", connection_options = {"path": "s3://my_bucket/output"}, format = "json", transformation_ctx = "datasink2")
job.commit()
最初具有的所有子结构。