提交火花时出现NoClassDefFoundError

时间:2019-12-20 14:20:30

标签: scala apache-spark

我正在尝试通过群集上的YARN提交Spark作业,但我在自定义类上始终收到NoClassDefFoundError异常:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.k2.dataIngestion.transformations.types.NodeMapper$
at org.k2.dataIngestion.transformations.types.NodeMapper$$anonfun$load$1.apply(NodeMapper.scala:39)
at org.k2.dataIngestion.transformations.types.NodeMapper$$anonfun$load$1.apply(NodeMapper.scala:33)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.mapelements_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.deserializetoobject_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
... 3 more

 ApplicationMaster host: 192.168.1.130
 ApplicationMaster RPC port: 0
 queue: default
 start time: 1576850235167
 final status: FAILED
 tracking URL: http://bd2-master01.k2.it:8088/proxy/application_1576657683834_0058/
 user: root

这是执行崩溃的一小段代码:

   dataset = ss.read.parquet(path)
  .drop(droppable:_*)
  .map(row => {
    val id = row.getAs[Long]("id")
    val lon = row.getAs[Double]("longitude")
    val lat = row.getAs[Double]("latitude")
    val tags = row.getAs[Seq[Row]]("tags")

   Row(geohash(lon, lat), id, wkt(lon, lat), tags) // <-- Here exception is thrown
  }) (RowEncoder.apply(schema))

实际上,我正在使用 maven-shade-plugin

生成一个uber-jar。
<plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-shade-plugin</artifactId>
          <version>3.2.0</version>
          <configuration>
              <filters>
                  <filter>
                      <artifact>*:*</artifact>
                      <excludes>
                          <exclude>META-INF/*.SF</exclude>
                          <exclude>META-INF/*.DSA</exclude>
                          <exclude>META-INF/*.RSA</exclude>
                      </excludes>
                  </filter>
              </filters>
          </configuration>
          <executions>
              <!-- Run shade goal on package phase -->
              <execution>
                  <phase>package</phase>
                  <goals>
                      <goal>shade</goal>
                  </goals>
                  <configuration>
                      <transformers>
                          <!-- add Main-Class to manifest file -->
                          <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                              <mainClass>org.k2.dataIngestion.transformations.Main</mainClass>
                          </transformer>
                      </transformers>
                  </configuration>
              </execution>
          </executions>
      </plugin>

“有趣”的部分是,如果删除对象内部的.map()操作,则不会引发异常。 有什么我想念的吗?

0 个答案:

没有答案