我有一个AWS Glue Python作业,可在不再需要它们时将数据从MySQL加载到S3文件中。
该工作被禁用了书签,但是我不断得到
IllegalArgumentException:“作业书签键List(ip)与保存的键Set(id)不匹配。请使用ResetJobBookmark清除现有的作业书签。
错误消息。
有人遇到过类似的AWS Glue行为吗?
详细日志:
文件“ /mnt/yarn/usercache/root/appcache/application_1572881176134_0001/container_1572881176134_0001_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py”,第328行,位于get_return_value中 格式(target_id,“。”,名称),值) py4j.protocol.Py4JJavaError:调用o91.getDynamicFrame时发生错误。 :java.lang.IllegalArgumentException:作业书签键List(ip)与保存的键Set(id)不匹配。使用ResetJobBookmark清除现有的Job Bookmark。
我已为工作重置了工作臂标记,但是没有成功。
我正在使用的代码:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'TABLES'])
#connection init
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
#create a list from input argument
tables = args['TABLES'].split(",")
#loop through table list
for table in tables:
table = table.strip() #remove spaces
#connect to table
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "prod", table_name = ("prod_" + table), transformation_ctx = "datasource0")
#drop fields
dropnullfields3 = DropNullFields.apply(frame = datasource0, transformation_ctx = "dropnullfields3")
#only one output file
dropnullfields3 = dropnullfields3.repartition(1)
#set path
s3_path = "s3://aws-glue-output1/" + table
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": s3_path}, format = "parquet", transformation_ctx = "datasink4")
job.commit()