Spark-Submit错误:无法从JAR文件加载主类

时间:2018-11-08 07:18:18

标签: scala apache-spark spark-submit

我正在尝试在Scala集群模式下spark-submit使用一个应用程序。它在PySpark中运行正常,但是在尝试与Scala一起运行时,弹出了上述错误。如果我必须添加SBT和Maven依赖项,您可以详细说明一下过程(我无法在Google中找到)

这是我的代码:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object MyFirst {
  def main(args: Array[String]) {
    // create Spark context with Spark configuration
    val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))

    // get threshold
    val threshold = args(1).toInt

    // read in text file and split each document into words
    val tokenized = sc.textFile(args(0)).flatMap(_.split(" "))

    // count the occurrence of each word
    val wordCounts = tokenized.map((_, 1)).reduceByKey(_ + _)

    // filter out words with fewer than threshold occurrences
    val filtered = wordCounts.filter(_._2 >= threshold)

    // count characters
    val charCounts = filtered.flatMap(_._1.toCharArray).map((_, 1)).reduceByKey(_ + _)

    System.out.println(charCounts.collect().mkString(", "))
  }
}

这是我的Build.sbt

name := "MyFirst"

scalaVersion := "2.10.3"

// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"

我的Spark提交为:spark-submit MyFirst --class MyFirst /home/ram/Downloads/sbt/src/target/scala-2.10/MyFirst_2.10-0.1.0-SNAPSHOT。罐子

1 个答案:

答案 0 :(得分:0)

我以前有这个问题。

尝试为您的课程定义一个程序包,如下所示:

package com.testing

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object MyFirst {
  def main(args: Array[String]) {
    // create Spark context with Spark configuration
    val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))
    // more code ...
  }
}

然后使用:

spark-submit MyFirst --class com.testing.MyFirst /home/ram/Downloads/sbt/src/target/scala-2.10/MyFirst_2.10-0.1.0-SNAPSHOT.jar

还要确保您已创建MANIFEST.MF文件。

这是一个简短的示例:

Manifest-Version: 1.0
Main-Class: com.testing.MyFirst