Question

我有一个AWS Glue Python作业，可在不再需要它们时将数据从MySQL加载到S3文件中。

该工作被禁用了书签，但是我不断得到

IllegalArgumentException：“作业书签键List（ip）与保存的键Set（id）不匹配。请使用ResetJobBookmark清除现有的作业书签。

错误消息。

有人遇到过类似的AWS Glue行为吗？

详细日志：

文件“ /mnt/yarn/usercache/root/appcache/application_1572881176134_0001/container_1572881176134_0001_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py”，第328行，位于get_return_value中格式（target_id，“。”，名称），值） py4j.protocol.Py4JJavaError：调用o91.getDynamicFrame时发生错误。：java.lang.IllegalArgumentException：作业书签键List（ip）与保存的键Set（id）不匹配。使用ResetJobBookmark清除现有的Job Bookmark。

我已为工作重置了工作臂标记，但是没有成功。

我正在使用的代码：

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'TABLES'])

#connection init
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

#create a list from input argument
tables = args['TABLES'].split(",")

#loop through table list
for table in tables:
    table = table.strip() #remove spaces
    #connect to table
    datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "prod", table_name = ("prod_" + table), transformation_ctx = "datasource0")
    #drop fields
    dropnullfields3 = DropNullFields.apply(frame = datasource0, transformation_ctx = "dropnullfields3")
    #only one output file
    dropnullfields3 = dropnullfields3.repartition(1)
    #set path
    s3_path = "s3://aws-glue-output1/" + table
    datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": s3_path}, format = "parquet", transformation_ctx = "datasink4")

job.commit()

即使禁用书签，AWS Glue书签错误

0 个答案: