我们希望使用AWS-Glue Job来过滤s3存储桶中的JSON消息。
以下是JSON的一些示例:
{ "property": {"subproperty1": "A", "subproperty2": "B" }}
{ "property": {"subproperty1": "C", "subproperty2": "D" }}
我们希望过滤subproperty1 in ["A", "B"]
。这是我们尝试的:
applyFilter1 = Filter.apply(
frame = datasource0,
f = lambda x: x["property.subproperty1"] in ["A", "B"]
)
然后输出一个新的s3存储桶,如下所示:
datasink2 = glueContext.write_dynamic_frame.from_options(
frame = applyFilter1,
connection_type = "s3",
connection_options = {"path": "s3://<my-s3-location>"},
format = "json",
transformation_ctx = "datasink2"
)
不幸的是,生成的文件是空的。任何的想法?是否过滤了AWS Glue支持的嵌套表达式?