我的代码如下所示
object ErrorTest { case class APIResults(status:String, col_1:Long, col_2:Double, ...) def funcA(rows:ArrayBuffer[Row])(implicit defaultFormats:DefaultFormats):ArrayBuffer[APIResults] = { //call some API ang get results and return APIResults ... } // MARK: load properties val props = loadProperties() private def loadProperties(): Properties = { val configFile = new File("config.properties") val reader = new FileReader(configFile) val props = new Properties() props.load(reader) props } def main(args: Array[String]): Unit = { val prop_a = props.getProperty("prop_a") val session = Context.initialSparkSession(); import session.implicits._ val initialSet = ArrayBuffer.empty[Row] val addToSet = (s: ArrayBuffer[Row], v: Row) => (s += v) val mergePartitionSets = (p1: ArrayBuffer[Row], p2: ArrayBuffer[Row]) => (p1 ++= p2) val sql1 = s""" select * from tbl_a where ... """ session.sql(sql1) .rdd.map{row => {implicit val formats = DefaultFormats; (row.getLong(6), row)}} .aggregateByKey(initialSet)(addToSet,mergePartitionSets) .repartition(40) .map{case (rowNumber,rows) => {implicit val formats = DefaultFormats; funcA(rows)}} .flatMap(x => x) .toDF() .write.mode(SaveMode.Overwrite).saveAsTable("tbl_b") } }
当我通过spark-submit
运行它时,它会抛出错误引起:java.lang.NoClassDefFoundError:无法初始化类staging_jobs.ErrorTest $ 。但是,如果我将val props = loadProperties()
移动到main
方法的第一行,那么就不会再出现错误了。谁能给我一个关于这种现象的解释?非常感谢!
Caused by: java.lang.NoClassDefFoundError: Could not initialize class staging_jobs.ErrorTest$
at staging_jobs.ErrorTest$$anonfun$main$1.apply(ErrorTest.scala:208)
at staging_jobs.ErrorTest$$anonfun$main$1.apply(ErrorTest.scala:208)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
... 8 more
答案 0 :(得分:2)
我遇到了和你一样的问题。我在convert
方法之外定义了一个方法main
。当我在dataframe.rdd.map{x => convert(x)}
中与main
一起使用时,发生了NoClassDefFoundError:Could not initialize class Test$
。
但是当我在convertor
方法中使用函数对象convert
(与main
方法相同的代码时),没有发生错误。
我使用了火花2.1.0,scala 2.11,它看起来像火花中的一个bug?
答案 1 :(得分:0)
我想问题是val props = loadProperties()
定义了外部类(main)的成员。然后,这个成员将被序列化(或运行)在执行程序上,这些执行程序没有带驱动程序的保存环境。