这是DAG和运算符:
args = {
'owner': 'airflow',
'start_date': airflow.utils.dates.days_ago(2),
}
dag = DAG(
dag_id = 'syncronizer',
default_args=args,
schedule_interval = '0 6 * * *',
catchup=False
)
SparkSubmitOperator(
task_id = 'syncronization_task',
spark_binary = '/opt/spark/bin/spark-submit',
total_executor_cores = 6,
executor_cores = 1,
num_executors = 6,
application = '.../synchronizer.jar',
name = 'syncronizer',
verbose=True,
conf = {
'spark.submit.deployMode': 'client',
'spark.serializer': 'org.apache.spark.serializer.KryoSerializer',
'spark.dynamicAllocation.enabled': 'true',
'spark.shuffle.service.enabled': 'true'
},
dag = dag
)
执行时有一个例外:
java.lang.ClassCastException: cannot assign instance of org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11 to field org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.cleanedF$2 of type scala.Function2 in instance of org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25
手动或通过BashOperator
提交此作业没有问题