Question

我有一个粘合任务，该任务从S3读取数据，对数据运行几个SQL查询，然后将数据输出到Redshift。我遇到一个奇怪的问题：在将dynamic_frame写入Redshift（使用glueContext.write_dynamic_frame.from_options）时，正在创建新列。这些是我现有的一些列，其类型附加到末尾。例如，如果我的框架架构如下：

id: string
value: short
value2: long
ts: timestamp

在Redshift中，我看到：

id varchar(256)
value: smallint    <---- The data here is always null
value2: bigint     <---- The data here is always null
ts: timestamp      
value_short: smallint
value2_long: bigint

在执行时将创建value_short和value2_long列（当前正在测试具有alter table权限的证书）

查看正在运行的COPY命令时，我看到该命令中的列value_short和value2_long。在用glueContext.write_dynamic_frame.from_options

编写之前，我没有看到动态框架中的列。

Answer 1

明确显示为aloissiola类型建议为我解决了这个问题。具体来说，我使用了 dynamicFrame.resolveChoice 函数：

changetypes = select1.resolveChoice(
        specs=[
            ("value", "cast:int"),
            ("value2", "cast:int")
        ]
    )

看起来您也可以强制转换为长型和短型。 https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-types.html我仔细检查了所有列的类型。

Answer 2

诀窍是将short值转换为整数。 Long-> bigint似乎为我工作。

动态框架写入额外的列

2 个答案: