spark执行kmeans时出现ClassNotFoundException错误

时间:2016-07-07 16:09:07

标签: scala apache-spark

我正在尝试使用spark KMeans提交火花作业。我正确打包scala文件,但是当我想提交作业时,我总是有ClassNotFoundException。 这是我的sbt fille:

名:=" sparkKmeans"

libraryDependencies + =" org.apache.spark" %%" spark-core" %" 1.1.1"

这是我的scala类:

uri: 'https://graph.facebook.com/v2.6/me/messages',
qs: { access_token: PAGE_ACCESS_TOKEN },
method: 'POST',
json: messageData

我评论了最后两行,因为我看到一些地方说火花有串行器的问题。但仍有问题。

这是错误:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
object sparkKmeans {
  def main(args: Array[String]) {
// create Spark context with Spark configuration
val sc = new SparkContext(new SparkConf().setAppName("SparkKmeans"))      
//val threshold = args(1).toInt
// Load and parse the data. source is the first argument.
   val data = sc.textFile(args(0))    
   val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
    // Cluster the data into classes using KMeans. number of itteration is   fixed as 100
    // and number of clusters is get from the input -second argument
    val numClusters = args(1)
    val numIterations = 100
    val clusters = KMeans.train(parsedData, numClusters, numIterations)

    // Evaluate clustering by computing Within Set Sum of Squared Errors
    val WSSSE = clusters.computeCost(parsedData)
    println("Within Set Sum of Squared Errors = " + WSSSE)

     // Save and load model based on thirs argument.
    //clusters.save(sc, args(2))
   // val sameModel = KMeansModel.load(sc, args(2))       
  }
}

并使用以下方式提交作业:

java.lang.ClassNotFoundException: sparkKmeans
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

如果有人能帮助我,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

感谢您的评论。 我做了你说的话: Built.sbt文件:     名称:= “sparkKmeans”

libraryDependencies ++= Seq(
"org.apache.spark" %%  "spark-core" % "1.6.1",
"org.apache.spark"  % "spark-mllib_2.10" %  "1.6.1"
)

(我使用了scala 2.11.8和Spark 1.6.1版本,但仍然是同样的错误。 另外还有一个问题: 我用以下方法打包我的应用程序 SBT 编 封装

并执行使用:

./bin/spark-submit --class sparkKmeans k/kmeans/target/scala-2.10/sparkkmeans_2.10-0.1-SNAPSHOT.jar  '/home/meysam/spark-1.6.1/kmeans/pima.csv' 3