我是AWS Glue的新手,我正在努力解决问题。我们最近更改了数据库中的一个字段名称,现在我无法弄清楚如何在Glue中创建映射以支持旧字段名称和新字段名称。
遗留映射看起来像:
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")
我们规范化了json属性名称,json_property['Foo Bar']
变为json_property.foo_bar
。我试过这样做:
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string"), ("json_property.foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")
基本上我尝试将两个源字段映射到同一目标字段。正如预期的那样,这在尝试运行作业时导致错误...
我有什么方法可以让这个过程从源json_property.foo_bar
或json_property['Foo Bar']
(以较早者为准)从foo_bar
目标字段开始?
答案 0 :(得分:2)
我通过在使用ApplyMapping
之前添加地图步骤来计算出来,以便将旧版字段名称映射到更新后的字段名称
## @type: DataSource
## @args: [database = "s3 olap", table_name = "example", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "s3 olap, table_name = "example", transformation_ctx = "datasource0")
## @type: Map
## @args: [f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields"]
## @return: datasource_mapped
## @inputs: [frame = datasource0]
def MergeLegacyFields(rec):
if 'Foo Bar' in rec:
rec['foo_bar'] = rec['Foo Bar']
return rec
datasource_mapped = Map.apply(frame = datasource0, f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields")
## @type: ApplyMapping
## @args: [mapping = [("foo_bar", "string", "foo_bar", "timestamp")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource_mapped]
applymapping1 = ApplyMapping.apply(frame = datasource_mapped, mappings = [("foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")