尝试使用SparkSubmitOperator从Airflow提交Spark-Job时获取java.lang.ClassCastException

时间:2019-06-24 13:55:12

标签: apache-spark airflow

这是DAG和运算符:

args = {
    'owner': 'airflow',
    'start_date': airflow.utils.dates.days_ago(2),
}

dag = DAG(
        dag_id = 'syncronizer',
        default_args=args,
        schedule_interval = '0 6 * * *',
        catchup=False
)

SparkSubmitOperator(
        task_id = 'syncronization_task', 
        spark_binary = '/opt/spark/bin/spark-submit',
        total_executor_cores = 6,
        executor_cores = 1,
        num_executors = 6,
        application = '.../synchronizer.jar',
        name = 'syncronizer',
        verbose=True,
        conf = {
                'spark.submit.deployMode': 'client',
                'spark.serializer': 'org.apache.spark.serializer.KryoSerializer',
                'spark.dynamicAllocation.enabled': 'true',
                'spark.shuffle.service.enabled': 'true'
        },
        dag = dag
)

执行时有一个例外:

java.lang.ClassCastException: cannot assign instance of org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11 to field org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.cleanedF$2 of type scala.Function2 in instance of org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25

手动或通过BashOperator提交此作业没有问题

0 个答案:

没有答案