我正在使用scala + spark开发一个应用程序。 我可以毫无问题地运行该项目,并且可以生成.jar(通过Intellij或仅使用sbt)。当我跑步时:
spark-submit --class ngram.Ngram ngramgenerator.jar "./test2.txt" 3 "./output/"
我有以下输出:
2019-12-11 17:53:37 WARN Utils:66 - Your hostname, carlo-HP-Notebook resolves to a loopback address: 127.0.1.1; using 192.168.1.110 instead (on interface wlo1)
2019-12-11 17:53:37 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-12-11 17:53:37 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
java.lang.ClassNotFoundException: ngram.Ngram
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2019-12-11 17:53:37 INFO ShutdownHookManager:54 - Shutdown hook called
2019-12-11 17:53:37 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e980dcbd-5fd9-4d05-934c-87a0a2b8e4b1
我的build.sbt:
scalaVersion := "2.11.11"
version := "1.0"
val sparkVersion = "2.4.3"
name := "ngramgenerator"
mainClass in (Compile, run) := Some("ngram.Ngram")
libraryDependencies ++= Seq("org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-core" % sparkVersion
)
我的MANIFEST.MF
Manifest-Version: 1.0
Class-Path: /home/carlo/Scrivania/PROGETTO scp/ngramgenerator/src/main/scala/ngram/Ngram.scala
Main-Class: ngram.Ngram
我要运行的课程:
package ngram
import java.io.{FileNotFoundException, IOException}
import org.apache.spark.sql.SparkSession
object Ngram {
def tokenize(src: String) : Array[String] = {
("<s> " ++ src ++ " </s>").toLowerCase.replaceAll("""[[^a-zA-Z]+&&[^'<>/]]""", " ")
.split(" ")
.filter(_.nonEmpty)
}
def getClosestMatch( m : Map[List[String], Int], elem : List[String]): (Int,Int) = {
if ( m.contains(elem) ) (m(elem), elem.length)
else getClosestMatch(m, elem.take(elem.length-1))
}
//while( ! m.contains(elem) ){
// elem.take(elem.length-1)
//}
//(m(elem), elem.length)
// args should contain:
// - path to corpus
// - maximum size of ngrams
// - path to output
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("Ngrams generator")
.master("local[*]")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
if (args.length ==0)
println("Ngrams counter: \n Args should contain: \n - path to corpus \n - maximum size of ngrams \n - path to output ")
else {
val input = spark.sparkContext.textFile(args(0)) //open file in args
val words = input.map( tokenize(_).toList) // tokenize every sentence
// val totWords = words.map(_.size).sum().toInt //count number of words in the document
val maxn : Int = args(1).toInt //size of ngrams
//val ngrams = (for(sentence <- words ;
// ngramsInSentence <-
// for( n <- 1 to maxn;
// ngram <- sentence.sliding(n).toList)
// yield ngram)
// yield ngramsInSentence )
val ngrams = words.flatMap(sentence => List.range(1, maxn).flatMap(n => sentence.sliding(n).toList) )
//val ngramsCount = ngrams.groupBy(identity).mapValues(_.size ) //no MapReduce
val ngramsCount = ngrams.map( x => (x, 1) ).reduceByKey(_ + _).persist() //with MapReduce
val vocabularySize = ngramsCount.filter( _._1.length == 1 ).count()
val copy = ngramsCount.filter( l => l._1.length < maxn ).collect().toMap
val ngramsProbability = ngramsCount
.map{ case (k,v) if k.length > 1 =>
def elem = getClosestMatch(copy, k.take(k.length-1))
(k, math.pow(0.4, maxn - elem._2) * v.toDouble / elem._1 )
case (k,v) =>
(k, math.pow(0.4, maxn - 1) * v.toDouble / vocabularySize )
}
ngramsProbability.repartition(1).saveAsTextFile(args(2))
println("Success")
spark.stop
}
}
}
我尝试使用IntelliJ IDE,但首先我只是使用sbt并遵循此https://spark.apache.org/docs/latest/quick-start.html
在尝试解决时进行编辑
我转到了一个更简单的项目结构,仅包含sbt(没有Intellij),它实质上是用我的代码在https://spark.apache.org/docs/latest/quick-start.html中编写的内容的副本(请参见)。
我运行clean package
提取jar,然后运行spark-submit
。输出如下:
Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp
at ngram.Ngram$.main(Ngram.scala:55)
at ngram.Ngram.main(Ngram.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
原因:java.lang.NoClassDefFoundError:scala / runtime / java8 / JFunction2 $ mcIII $ sp ...另外12个 引起原因:java.lang.ClassNotFoundException:scala.runtime.java8.JFunction2 $ mcIII $ sp 在java.net.URLClassLoader.findClass(URLClassLoader.java:382) 在java.lang.ClassLoader.loadClass(ClassLoader.java:424) 在java.lang.ClassLoader.loadClass(ClassLoader.java:357) ...还有12个
答案 0 :(得分:1)
原因:java.lang.NoClassDefFoundError:scala / runtime / java8 / JFunction2 $ mcIII $ sp ... 12更多原因:java.lang.ClassNotFoundException:scala.runtime.java8.JFunction2 $ mcIII $ sp at < / p>
您在运行时中缺少Java 8类。
您确定在群集上拥有Java 8并且您实际上正在使用它吗?
更新
看看this问题。