将两个源字段中的任何一个映射到单个目标字段

时间:2018-02-27 15:03:07

标签: etl aws-glue

我是AWS Glue的新手,我正在努力解决问题。我们最近更改了数据库中的一个字段名称,现在我无法弄清楚如何在Glue中创建映射以支持旧字段名称和新字段名称。

遗留映射看起来像:

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")

我们规范化了json属性名称,json_property['Foo Bar']变为json_property.foo_bar。我试过这样做:

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [...("json_property.Foo Bar", "string", "foo_bar", "string"), ("json_property.foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")

基本上我尝试将两个源字段映射到同一目标字段。正如预期的那样,这在尝试运行作业时导致错误...

我有什么方法可以让这个过程从源json_property.foo_barjson_property['Foo Bar'](以较早者为准)从foo_bar目标字段开始?

1 个答案:

答案 0 :(得分:2)

我通过在使用ApplyMapping之前添加地图步骤来计算出来,以便将旧版字段名称映射到更新后的字段名称

## @type: DataSource
## @args: [database = "s3 olap", table_name = "example", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "s3 olap, table_name = "example", transformation_ctx = "datasource0")

## @type: Map
## @args: [f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields"]
## @return: datasource_mapped
## @inputs: [frame = datasource0]
def MergeLegacyFields(rec):
  if 'Foo Bar' in rec:
    rec['foo_bar'] = rec['Foo Bar']
  return rec

datasource_mapped = Map.apply(frame = datasource0, f = MergeLegacyFields, transformation_ctx = "merge_legacy_fields")

## @type: ApplyMapping
## @args: [mapping = [("foo_bar", "string", "foo_bar", "timestamp")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource_mapped]
applymapping1 = ApplyMapping.apply(frame = datasource_mapped, mappings = [("foo_bar", "string", "foo_bar", "string")], transformation_ctx = "applymapping1")