我正在尝试通过群集上的YARN提交Spark作业,但我在自定义类上始终收到NoClassDefFoundError异常:
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.k2.dataIngestion.transformations.types.NodeMapper$
at org.k2.dataIngestion.transformations.types.NodeMapper$$anonfun$load$1.apply(NodeMapper.scala:39)
at org.k2.dataIngestion.transformations.types.NodeMapper$$anonfun$load$1.apply(NodeMapper.scala:33)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
... 3 more
ApplicationMaster host: 192.168.1.130
ApplicationMaster RPC port: 0
queue: default
start time: 1576850235167
final status: FAILED
tracking URL: http://bd2-master01.k2.it:8088/proxy/application_1576657683834_0058/
user: root
这是执行崩溃的一小段代码:
dataset = ss.read.parquet(path)
.drop(droppable:_*)
.map(row => {
val id = row.getAs[Long]("id")
val lon = row.getAs[Double]("longitude")
val lat = row.getAs[Double]("latitude")
val tags = row.getAs[Seq[Row]]("tags")
Row(geohash(lon, lat), id, wkt(lon, lat), tags) // <-- Here exception is thrown
}) (RowEncoder.apply(schema))
实际上,我正在使用 maven-shade-plugin
生成一个uber-jar。<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.0</version>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<!-- add Main-Class to manifest file -->
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.k2.dataIngestion.transformations.Main</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
“有趣”的部分是,如果删除对象内部的.map()操作,则不会引发异常。 有什么我想念的吗?