我想用一小部分可用数据来测试我的AWS Glue PySpark作业。如何实现这一目标?
我的第一次尝试是将Glue动态帧转换为spark数据帧,并使用take(n)方法来限制要处理的行数,如下所示:
datasource0 = glueContext.create_dynamic_frame.from_catalog(
database = "my_db",
table_name = "my_table",
transformation_ctx = "ds0")
applymapping1 = ApplyMapping.apply(
frame = datasource0,
mappings = [("foo", "string", "bar", "string")],
transformation_ctx = "am1")
truncated_df = applymapping1.toDF().take(1000)
datasink2 = glueContext.write_dynamic_frame.from_options(
frame = DynamicFrame.fromDF(truncated_df, glueContext, "tdf"),
connection_type = "s3",
... )
job.commit()
此操作失败,并显示以下错误消息:
AttributeError: 'list' object has no attribute '_jdf'
有什么想法吗?
答案 0 :(得分:1)
尝试分别转换数据,然后在datasink中使用动态帧名称