火花提交ClassNotFoundException或NoClassDef

时间:2019-12-11 17:22:51

标签: scala apache-spark intellij-idea jar spark-submit

我正在使用scala + spark开发一个应用程序。 我可以毫无问题地运行该项目,并且可以生成.jar(通过Intellij或仅使用sbt)。当我跑步时:

spark-submit --class ngram.Ngram ngramgenerator.jar "./test2.txt" 3 "./output/"

我有以下输出:

2019-12-11 17:53:37 WARN  Utils:66 - Your hostname, carlo-HP-Notebook resolves to a loopback address: 127.0.1.1; using 192.168.1.110 instead (on interface wlo1)
2019-12-11 17:53:37 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-12-11 17:53:37 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
java.lang.ClassNotFoundException: ngram.Ngram
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2019-12-11 17:53:37 INFO  ShutdownHookManager:54 - Shutdown hook called
2019-12-11 17:53:37 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-e980dcbd-5fd9-4d05-934c-87a0a2b8e4b1

我的build.sbt:

scalaVersion     := "2.11.11"
version          := "1.0"
val sparkVersion = "2.4.3"
name := "ngramgenerator"
mainClass in (Compile, run) := Some("ngram.Ngram")

libraryDependencies ++= Seq("org.apache.spark" %% "spark-sql" % sparkVersion,
    "org.apache.spark" %% "spark-core" % sparkVersion
    )

我的MANIFEST.MF

Manifest-Version: 1.0
Class-Path: /home/carlo/Scrivania/PROGETTO scp/ngramgenerator/src/main/scala/ngram/Ngram.scala
Main-Class: ngram.Ngram

我要运行的课程:

package ngram

import java.io.{FileNotFoundException, IOException}

import org.apache.spark.sql.SparkSession

object Ngram {

def tokenize(src: String) : Array[String] = {
    ("<s> " ++ src ++ " </s>").toLowerCase.replaceAll("""[[^a-zA-Z]+&&[^'<>/]]""", " ")
    .split(" ")
    .filter(_.nonEmpty)
    }

def getClosestMatch( m : Map[List[String], Int], elem : List[String]): (Int,Int) = {
    if ( m.contains(elem) ) (m(elem), elem.length)
    else getClosestMatch(m, elem.take(elem.length-1))
    }
    //while( ! m.contains(elem) ){
    //  elem.take(elem.length-1)
    //}
    //(m(elem), elem.length)

// args should contain:
//  - path to corpus
//  - maximum size of ngrams
//  - path to output

def main(args: Array[String]): Unit = {

    val spark = SparkSession
        .builder()
        .appName("Ngrams generator")
        .master("local[*]")
        .getOrCreate()

    spark.sparkContext.setLogLevel("ERROR")

    if (args.length ==0)
        println("Ngrams counter: \n Args should contain: \n - path to corpus \n - maximum size of ngrams \n - path to output ")
    else {
        val input =  spark.sparkContext.textFile(args(0)) //open file in args

        val words = input.map( tokenize(_).toList) // tokenize every sentence

        // val totWords = words.map(_.size).sum().toInt //count number of words in the document

        val maxn : Int = args(1).toInt //size of ngrams

        //val ngrams = (for(sentence <- words ;
        //                  ngramsInSentence <-
        //                    for( n <- 1 to maxn;
        //                       ngram <- sentence.sliding(n).toList)
        //                       yield ngram)
        //                  yield ngramsInSentence )

        val ngrams = words.flatMap(sentence => List.range(1, maxn).flatMap(n => sentence.sliding(n).toList) )

        //val ngramsCount = ngrams.groupBy(identity).mapValues(_.size ) //no MapReduce

        val ngramsCount = ngrams.map( x => (x, 1) ).reduceByKey(_ + _).persist() //with MapReduce

        val vocabularySize = ngramsCount.filter( _._1.length == 1 ).count()

        val copy = ngramsCount.filter( l => l._1.length < maxn  ).collect().toMap

        val ngramsProbability = ngramsCount
                                                                        .map{ case (k,v) if k.length > 1 =>
                                                                                        def elem = getClosestMatch(copy, k.take(k.length-1))
                                                                                        (k, math.pow(0.4, maxn - elem._2) * v.toDouble / elem._1 )
                                                                                    case (k,v) =>
                                                                                        (k, math.pow(0.4, maxn - 1) * v.toDouble / vocabularySize )
                                                                                }

        ngramsProbability.repartition(1).saveAsTextFile(args(2))
        println("Success")
        spark.stop
        }
}
}

我尝试使用IntelliJ IDE,但首先我只是使用sbt并遵循此https://spark.apache.org/docs/latest/quick-start.html

在尝试解决时进行编辑

我转到了一个更简单的项目结构,仅包含sbt(没有Intellij),它实质上是用我的代码在https://spark.apache.org/docs/latest/quick-start.html中编写的内容的副本(请参见)。 我运行clean package提取jar,然后运行spark-submit。输出如下:

Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp
at ngram.Ngram$.main(Ngram.scala:55)
at ngram.Ngram.main(Ngram.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

原因:java.lang.NoClassDefFoundError:scala / runtime / java8 / JFunction2 $ mcIII $ sp     ...另外12个     引起原因:java.lang.ClassNotFoundException:scala.runtime.java8.JFunction2 $ mcIII $ sp     在java.net.URLClassLoader.findClass(URLClassLoader.java:382)     在java.lang.ClassLoader.loadClass(ClassLoader.java:424)     在java.lang.ClassLoader.loadClass(ClassLoader.java:357)     ...还有12个

1 个答案:

答案 0 :(得分:1)

  

原因:java.lang.NoClassDefFoundError:scala / runtime / java8 / JFunction2 $ mcIII $ sp ... 12更多原因:java.lang.ClassNotFoundException:scala.runtime.java8.JFunction2 $ mcIII $ sp at < / p>

您在运行时中缺少Java 8类。

您确定在群集上拥有Java 8并且您实际上正在使用它吗?

更新

看看this问题。