我正在尝试使用Grapghx Pregel
处理分层数据,而我在本地的代码工作正常。
但是当我在我的Amazon EMR
群集上运行时,它给了我一个错误:
java.lang.NoClassDefFoundError: Could not initialize class
发生这种情况的原因是什么?我知道这个类在jar文件中,因为它在我的本地运行良好,并且没有构建错误。
我在pom文件中包含了GraphX依赖项。
以下是抛出错误的代码片段:
def calcTopLevelHierarcy (vertexDF: DataFrame, edgeDF: DataFrame): RDD[(Any, (Int, Any, String, Int, Int))] =
{
val verticesRDD = vertexDF.rdd
.map { x => (x.get(0), x.get(1), x.get(2)) }
.map { x => (MurmurHash3.stringHash(x._1.toString).toLong, (x._1.asInstanceOf[Any], x._2.asInstanceOf[Any], x._3.asInstanceOf[String])) }
//create the edge RD top down relationship
val EdgesRDD = edgeDF.rdd.map { x => (x.get(0), x.get(1)) }
.map { x => Edge(MurmurHash3.stringHash(x._1.toString).toLong, MurmurHash3.stringHash(x._2.toString).toLong, "topdown") }
// create the edge RD top down relationship
val graph = Graph(verticesRDD, EdgesRDD).cache()
//val pathSeperator = """/"""
//initialize id,level,root,path,iscyclic, isleaf
val initialMsg = (0L, 0, 0.asInstanceOf[Any], List("dummy"), 0, 1)
val initialGraph = graph.mapVertices((id, v) => (id, 0, v._2, List(v._3), 0, v._3, 1, v._1))
val hrchyRDD = initialGraph.pregel(initialMsg, Int.MaxValue, EdgeDirection.Out)(setMsg, sendMsg, mergeMsg)
//build the path from the list
val hrchyOutRDD = hrchyRDD.vertices.map { case (id, v) => (v._8, (v._2, v._3, pathSeperator + v._4.reverse.mkString(pathSeperator), v._5, v._7)) }
hrchyOutRDD
}
我能够缩小导致错误的行:
val hrchyRDD = initialGraph.pregel(initialMsg,Int.MaxValue,EdgeDirection.Out)(setMsg,sendMsg,mergeMsg)
答案 0 :(得分:0)
我也遇到了同样的问题,当我从spark-submit执行时,我能够在spark-shell失败的情况下运行它。这是我尝试执行的代码中的example(看起来与您的代码相同)
向我指出正确解决方案的错误是:
org.apache.spark.SparkException: A master URL must be set in your configuration
在我的情况下,由于在主函数之外定义了SparkContext,我遇到了该错误:
object Test {
val sc = SparkContext.getOrCreate
val sqlContext = new SQLContext(sc)
def main(args: Array[String]) {
...
}
}
我能够通过将主函数内的SparkContext和sqlContext移动为described in this other post来解决此问题