我必须将AWS Glue中数据库下的约300张表转移到AWS Redshift上的dc2.large集群。不知何故,在Redshift集群中仅创建了99个表,其余查询都被中止,没有任何详细信息。
我尝试检查是否是存储问题,不是。我试图检查每个查询的详细信息,但未提供详细信息。我试图查询STL_LOAD_ERRORS,没有记录。我不知道集群类型/大小是否会影响这一点。
# constants are filled
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
logger = glueContext.get_logger()
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
client = boto3.client(service_name='glue', region_name=aws_region)
responseGetTables = client.get_tables(DatabaseName=source_db)
tableList = responseGetTables['TableList']
for tableDict in tableList:
table = tableDict['Name']
try:
datasource = glueContext.create_dynamic_frame.from_catalog(
database = source_db,
table_name = table,
transformation_ctx = "datasource"
)
datasink = glueContext.write_dynamic_frame.from_jdbc_conf(
frame = datasource,
catalog_connection = connection_name,
connection_options = {
"dbtable": table,
"database": target_db,
"aws_iam_role": iam_role
},
redshift_tmp_dir = args["TempDir"],
transformation_ctx = "datasink"
)
except Exception as e:
pass
job.commit()
上面是从AWS Glue运行的ETL作业,该作业成功。一些查询已完成(因此创建了99个表),更多查询在30毫秒内被中止。
有人知道这件事为什么发生以及如何解决吗?谢谢!