我正在尝试使用spark KMeans提交火花作业。我正确打包scala文件,但是当我想提交作业时,我总是有ClassNotFoundException。 这是我的sbt fille:
名:=" sparkKmeans"
libraryDependencies + =" org.apache.spark" %%" spark-core" %" 1.1.1"
这是我的scala类:
uri: 'https://graph.facebook.com/v2.6/me/messages',
qs: { access_token: PAGE_ACCESS_TOKEN },
method: 'POST',
json: messageData
我评论了最后两行,因为我看到一些地方说火花有串行器的问题。但仍有问题。
这是错误:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
object sparkKmeans {
def main(args: Array[String]) {
// create Spark context with Spark configuration
val sc = new SparkContext(new SparkConf().setAppName("SparkKmeans"))
//val threshold = args(1).toInt
// Load and parse the data. source is the first argument.
val data = sc.textFile(args(0))
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
// Cluster the data into classes using KMeans. number of itteration is fixed as 100
// and number of clusters is get from the input -second argument
val numClusters = args(1)
val numIterations = 100
val clusters = KMeans.train(parsedData, numClusters, numIterations)
// Evaluate clustering by computing Within Set Sum of Squared Errors
val WSSSE = clusters.computeCost(parsedData)
println("Within Set Sum of Squared Errors = " + WSSSE)
// Save and load model based on thirs argument.
//clusters.save(sc, args(2))
// val sameModel = KMeansModel.load(sc, args(2))
}
}
并使用以下方式提交作业:
java.lang.ClassNotFoundException: sparkKmeans
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
如果有人能帮助我,我将不胜感激。
答案 0 :(得分:0)
感谢您的评论。 我做了你说的话: Built.sbt文件: 名称:= “sparkKmeans”
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1",
"org.apache.spark" % "spark-mllib_2.10" % "1.6.1"
)
(我使用了scala 2.11.8和Spark 1.6.1版本,但仍然是同样的错误。 另外还有一个问题: 我用以下方法打包我的应用程序 SBT 编 封装
并执行使用:
./bin/spark-submit --class sparkKmeans k/kmeans/target/scala-2.10/sparkkmeans_2.10-0.1-SNAPSHOT.jar '/home/meysam/spark-1.6.1/kmeans/pima.csv' 3